6 min to read
FAQs in AI Project Management
This is the second article in the knowledge series for Start adopting AI to generate Business Value. I will try to go over some of the questions subjectively.
- What does it take to build & sustain organization wide ML/AI capabilities?
- What are the common misconceptions in ML/AI project management?
- What a traditional software project managers should be aware of to lead AI Projects efficiently?
- How to avoid possible traps which can lead to loss & inefficiency in ML/AI teams & projects?
- What kind of a project life-cycle ML/AI projects goes through?
- How to adapt agile methodologies in the context of ML/AI project & how they are different compared to traditional software development agile practices?
- How to efficiently use computation and human resources for expensive ML/AI activities under constrained budget & resources?
What does it take to build & sustain organization wide ML/AI capabilities?
- Don’t fall in the trap or shake yourself out of the common ML/AI project traps
- Aim for longterm sustainable ML/AI program driven by business need, not by the technology hypes
- A common business which holds true at any point of time is ‘adaption to the market trends & technologies’. Thus even if you do not have a business scenario, your need to invest in long term sustainable ML/AI program can be: “To stay competitive and to respond to market demands”.
What are the common misconceptions in ML/AI project management?
- Data scientist are enough for ML/AI project
- Data driven management style can lead to take decisions to improve the team efficiency or effective use of computation resources
- It can fit the traditional software development life cycle or implementing agile project management methodologies by the book
- Ball point estimations on time, effort or complexity can be estimated reliably
- Work breakdown structure (WBS) is similar to conventional software development
- Same version management tools can be used for AI models/weights and dataset management
What a traditional software project managers should be aware of to lead AI Projects efficiently?
- Individual team member needs & challenges being faced by them
- Regular training & up-skilling team members is essential and hence, encouraging them to self-learn beyond office hours
How to avoid possible traps which can lead to loss & inefficiency in ML/AI teams & projects?
- Don’t ever get stuck on single problem for too long, not more then 1 month. Find alternative means to solve using non-AI or manual approach if it’s important to be solved
- Sticking to single or few neural architecture and not letting the team to explore and try & study other architectures
- By conducting regular research paper discussion, knowledge sharing sessions and hands-on training
- By avoiding single person locking and taking the responsibility of AI DevOps, everyone needs to learn & known the AI DevOps
- Be careful when and for what reason you hire pure data scientist without software engineering skills
- Create an end-to-end pipeline early
- Feedback / Exception handling loop is essential and should be part of the end-to-end pipeline from the very beginning
What kind of a project life-cycle ML/AI projects goes through?
- Phase-0
- Initial Research – theoretical
- Brief writeup with references as deliverables
- Phase-1
- Hands-on – Trial & Error
- Deliverables:
- First cut working codebase, readme
- Publish the scraped dataset or downloaded public datasets
- Requirements for custom dataset
- Phase-2
- Integration with underline common ML/AI pipeline
- Early release of the AI API/models
- Phase-3
- Custom model - trained on custom dataset
- Phase-4
- Accuracy Hyperloop
- Custom model - modified DNN
- Phase-5
- Model Optimization
- As high as 30% to 40% time can be accounted as non-billable from the perspective of AI consultation services.
- Normally in ML/AI a POC can span over 1 month to 3 months; any POC larger then this has to be broken down into smaller POC items or different phases for POC
How to adapt agile methodologies in the context of ML/AI project & how they are different compared to traditional software development agile practices?
Adapt agile methodologies:
- Have small teams with each team member bears the ownership for at least one DNN architecture related problems
- All team members to be aware of and conduct AI DevOps activity
- Each one teach one - learning, training & teaching should be a routine activity and 20% time should be spent on trainings
- Break problems into small conceptual problems and use mock data to formulate the working code and develop deeper understanding when working with multidimensional tensors
Differences for ML/AI projects compared to conventional software development agile practices:
- User stories are hard to craft because of the Iterative & repetitive nature to train AI model that meets business expectations. Same model can take 3 months or more in the hyper-tuning phase
- Difficult to craft story points or to estimate effort on model trainings or how much to train. Again for time-effort-maturity matrix for new types of problem statements is unknown and cannot be linearly extrapolated based on past experiences because every problem is unique
- Research oriented nature for the ML/AI projects
- Unlike non-gpu based hardwares the subtle differences in underlying GPU architectures can lead to strange and unexpected results. There are many reasons for it: floating point precision, driver & sdk behavior differences and compatibilities
How to efficiently use computation and human resources for expensive ML/AI activities under constrained budget & resources?
- Constant monitoring of resource consumptions and efficacy of the team members
- Use Stochastic indicators, event and behavior driven insights - learn the pattern of individual team members working style, re-orient and empower them into effective resource utilizations
- At least 70% of the work can be off loaded to laptops with 2GB GPU and 90% work can be offloaded to 6 GB laptops
- 8 GB GPUs PCs can be efficiently utilized compared to 24 GB GPU servers
- Computation loads comes in peak timings and remain idle for non working hours of at least 12 to 15 hrs per day
- trend for GPU loads can be categorized into: low (10% utilization), medium (20 to 30 % utilization) and high ( 30 to 70 % utilization)
Some other questions worth knowing are -
This will be for the next post…
- How to build and grow efficient agile ML/AI teams?
- How to efficiently use computation and human resources for expensive ML/AI activities under constrained budget & resources?
- What are the key factors for a sustainable ML/AI program that can derive business growth in long term?
- How to bring 100% transparency for ML/AI engineers and team activities? Why is this aspect is fundamental and critical?
- What kind of code and data handling practices, policies required and risks involved?
- What are the common misconceptions around on what should be consider as ‘Gold mine’ in ML/AI projects?
- What kind of people should be hired for ML/AI and what should be their skillset?
- How to keep ML/AI engineer motivated and what should be done to make them efficient in their work over time?