Few trends got more attention in tech last year than artificial intelligence. In 2018, organizations made big investments—out of fear, mostly—according to a NewVantage Partners survey. Of the 92% of Fortune 1000 that increased their AI investments, another 92% reports they did it in order to achieve the agility required to prevent being outmaneuvered by competitors.
Adding fuel to the AI/ML fire, in a recent 451 Research Survey, of the 40% of survey respondents who either had machine learning (ML) in production or in a proof-of-concept stage, 92% have a positive opinion regarding the performance of their ML projects. However, cooling off this fire a little bit, Matt Aslet of 451 Research also noted during a webinar on February 28th of 2019 that:
A great deal of attention has been paid to catering to the first three processes involved in AI and machine learning: data collection, structuring and modeling. The last step, ensuring models are effectively operationalized, has been somewhat neglected. And yet placing machine learning models into production and ensuring they remain effective is critical to successful data science.
The challenge with AI and ML is that once you find a great algorithm, you still need to build data pipelines that will deliver the data on a regular and reliable basis so those algorithms can be used to drive automated decision making. And while some claim that data scientists can build their own data pipelines, the reality is that building data pipelines is a data engineering task that most data scientists don’t actually know how to do, nor do they want to learn. Data engineering is not the same as data science and requires a different set of skills that is not focused on inventing new analytics, but focused on creating a repetitive and bullet proof data processes.
This is much like confusing designing a new car with the building of the manufacturing line for the car. You can hand craft an automobile, and car manufacturers do this when they create car prototypes. But you can’t build a scalable business around making handcrafted cars one at at time. You have to then create a manufacturing line that can churn out automobiles that are high quality on a regular basis. But the car designers don’t create the manufacturing line and vice versa. The same is true for data.
The critical point is that if you don’t or can’t build scalable reliable data pipelines to deliver data to you AI/ML engine, then you may have an interesting algorithm, but you can’t use it to run your business. And if you can’t use it to run you business, AI without the data, might just as well be “A”.