According to a recent Gartner blog about analytics and BI solutions, only 20% of analytical insights will deliver business outcomes through 2022. Another article by VentureBeat AI reported that 87% of data science projects never make it into production. And a global survey by Dimensional Research concluded that 78% of their AI/ML projects stall at some stage before deployment. These results indicate an exceptionally high failure rate across analytics, data science, and machine learning projects. There are many reasons why so many projects fail to meet their business objectives. In this blog, we look at the top practical challenges that enterprise AI projects face and how you can mitigate them:
- Identifying business problems and appropriate use cases
While AI is an incredibly powerful technology, it is not a panacea for every business problem. Building AI because everyone is doing it and throwing any problem at it without concrete objectives is a path to failure. AI is great at sifting through massive amounts of data, discovering patterns, and finding hidden insights that otherwise are not obvious. To get started, prioritize hard to solve, complex business problems that have clear objectives. Assemble a cross-functional team of technical and functional experts, ensure buy-in from domain experts. Finally, define success criteria and measure success with relevant key metrics.
- Access to high-quality data
AI and Machine learning tools rely on data to train underlying algorithms. Access to clean, meaningful data that is representative of the problem at hand is critical for the success of AI initiatives. But, enterprise data tends to be biased, noisy, outdated, unstructured, and full of errors. Many companies lack data infrastructure or do not have enough volume or quality data. Others use antiquated error-prone manual methods for data preparation resulting in inaccurate data and ultimately wrong business decisions. A typical enterprise data architecture should include master data preparation tools designed for data cleansing, formatting, and standardization before storing the data in data lakes and data marts. Data quality, data management, and governance issues are of paramount importance given the high reliance on good quality data and if overlooked, can derail any AI and ML project.
- Data pipeline complexity
Data is spread across disparate databases in different formats and you need to blend and consolidate data from disconnected systems. The challenge is how to extract data, how to clean data, and reformat data to make it ready for predictive analytics. This processed data requires further manipulation that is specific to AI/ML pipelines including additional table joining and further data prep and cleansing. The process requires data engineers to write SQL code and perform manual joins to complete the remaining tasks. This complex process of data ingestion, storage, cleansing, and transformation takes time and is a major bottleneck in scaling data science operations. Automated machine learning tools such as AutoML 2.0 platforms eliminate the complexity of the data pipeline by automating a full-cycle data science workflow. Through automation, these platforms transform the raw data into the inputs of machine learning a.k.a. feature engineering, and produce predictions by combining hundreds of or even more features.
- Balancing model accuracy and interpretability
There is a trade-off between prediction accuracy and model interpretability and data scientists have to do the balancing act by selecting the appropriate modeling approach. Generally speaking, higher accuracy means complex models that are hard to interpret. Easy interpretation means using simpler models but that comes by sacrificing a little bit of accuracy. Traditional data science projects tend to adopt what is known as black-box approaches that generate minimal actionable insights resulting in a lack of accountability in the decision-making process. The solution to the transparency paradox is a new approach that involves using white-box models. White-box modeling implies generating transparent features and models that empower your AI team to execute complex projects with confidence and certainty. White-box models (WBMs) provide clear explanations of how they behave, how they produce predictions, and what variables influenced the model. WBMs are preferred in many enterprise data science use cases because of their transparent ‘inner-working’ modeling process and easily interpretable behavior. Explainability is very important in enterprise data science projects. By giving insight about how the prediction models work and the reasoning behind predictions, organizations can build trust and increase transparency. AutoML 2.0 platforms automate the trade-off between accuracy and interpretability and give users the choice to select the right approach based on the use case.
- Model operationalization and deployment
ML delivers value when a data scientist exports the final model from Jupiter notebook to deploy it in production. Operationalization means that the model is running in a production environment (not a sandbox environment), connected to business applications, and making predictions using live data. This last mile deployment has been a slow, manual, and prolonged process rendering the models, and insights obsolete. It can take anywhere between 8 to 90 days to deploy a single model in production. Irrespective of the AI and ML platform used, it should provide end-points to run and control the developed pipeline, and easily integrate with other business systems using standard APIs. There are several approaches to moving models into production. You need to think through batch vs real-time prediction and take into account whether real-time prediction service is feasible in terms of cost, infrastructure, and complexity. Deployment also includes monitoring the model performance, capturing the performance degradation, and updating models as necessary. Automation makes enterprise-level, end-to-end data science operationalization possible with minimum effort and maximum impact, enabling enterprise data science and software/IT teams to operationalize complex data science projects. Every enterprise data science project should start with a plan to deploy models in production to capture value and realize AI’s potential.