This Eckerson Group report recommends 10 vital steps to attain success in DataOps.READ MORE
In the not too a distant future, the business world will be split into two camps – companies which have an agile analytics capability and those companies that get eaten by the first group. Even today, it is already clear that the data and analytics capability of a company will be a critically determining factor in its success over the competition.
Today, most companies typically implement a very limited number of data analytics use cases and deploy them to production each year. If they are able to implement ten (10) new end to end use cases in a single year, most people would say they are doing well. However, for state of the art organizations, is not just ten, but tens, with an “s” of use cases per year.
If you want to have a true digital transformation in your organization, you will need to be able to implement hundreds or thousands of new analytics use cases in a year. That may sound like a high number, but the ability to conceive a data analytic use case, quickly implement that use case, then decide if you can use that information to guide your decision making is what separates the Googles and Amazons from the rest of the pack.
Some of the use cases will have lots of data, and some not so much. The ability to process a large amount of data is now table stakes. The true differentiator will be in the agility to implement a large number of data-driven use cases and implement them very quickly.
It is hard to imagine, but there was a time when Google was using traditional data and analytics stack, traditional ETL and relational data warehouses for several of its core internal analytics, just like most companies are today. And much like most companies today, adding new use cases could take months.
One of the jobs I had at Google was to migrate and automate its internal systems from a traditional analytics stack, to what became known as a “big data” stack. The goal was to be able to execute new use cases several orders of magnitude faster than before. We wanted to be able to create new data analytics use cases almost as fast as we could think of them. So we set out on a journey to automate the creation and operations of data analytics pipelines. This gave the company the agility to hyper-analyze all kinds of data about its operations.
Today, every company is looking to emulate data companies such as Google, Amazon, and Facebook in their own domains. Data and analytics have become a strategic company initiative. C-level executives and the boards of directors across all kinds of different industry segments are looking to gain the agility to transform their businesses.
Retailers don’t just sell goods, they are now also data companies as they figure out what products are most popular and what associated products to promote to you.
Oil & gas exploration companies are now data companies as they analyze real-time drilling data to optimize well depth and maximize the yield they get from an oil field.
Shipping logistics companies are now data companies as they figure out strategies to minimize the number of stops and maximize truck utilization and gas mileage.
The real question you should be considering is…
This is the fundamental problem we at Infoworks have set out to solve in a completely holistic way that has not yet been addressed by the industry… until now. At Infoworks, we believe the solution to enabling any company to become an agile data-driven company has three main components:
Data analytics at scale for BI, advanced analytics or data warehousing style use cases, and specifically at scale in the number of use cases and the number of people who can implement those use cases is impossible to achieve if you have to manually implement those use cases. The underlying data analytics technologies, like Hadoop and Spark, are just too complicated for most people to successfully use. Your company just can’t hire enough data engineering experts to implement and manage all of the potential analytics the business will want to run.
To address this, Infoworks has built an end to end software platform that automates 100s of data engineering tasks and gives you the agility to launch new analytics and data applications to production 10-100x faster and with a lot fewer people requiring a lot less “big data” expertise. The Infoworks platform automates the use case development, the development process such as dev to production migration, data reconciliation, etc. AND the dataops process to run hundreds of data pipelines at-scale, in production.
There are lots of steps in the data analytics and data engineering process, from data ingest, to data transformation, to generation of fast query in-memory models and OLAP cubes to the operationalization of all of those steps in a repeatable fashion. Using different technologies from a wide variety of “best-in-class” vendors and then stitching them together makes it very difficult, if not impossible, to truly automate the end to end process for all of the people involved in the data analytics process going from source to consumption.
The Infoworks platform provides a collaborative interface where data engineers/IT, data analysts, data scientists, and production engineers can iterate and launch use cases end to end, with no coding necessary. Typically, data engineers use the platform to onboard data sources, data analysts and data scientists use it to build business logic and data models and orchestrate the end to end use case, while production engineers migrate and manage them in production.
There is no need for specialized resources or big data expertise to launch and manage a complete use case because the end to end automation (see point 1 above) takes care of the complexity. The platform is also extensible via APIs that allows users to add functionality and extend it.
The analytics infrastructure world is rapidly changing, with implementations ranging from on-premise data warehouses and data lakes to new cloud-based solutions and server-less deployments. Because of this rapid evolution, companies are deploying analytics with a wide variety of underlying technologies and are very often moving from one environment to another to take advantage of the latest technologies and prices points. All of these environments have enough differences between them, that to achieve true scalability and performance, users have to recode their data pipelines if they move from one infrastructure to another.
Infoworks platform is built on a storage and compute abstraction layer that maximizes portability for end to end data analytics pipelines. A use case built using Infoworks platform can run on an on-premise big data cluster or any of the cloud environments. Migration from one environment to another is done with the click of a button. This infrastructure independence is a key component of an agile data and analytics platform, and Infoworks platform was designed to allow for this level of extreme agility from day one.
I will be writing more about the importance of agility in analytics and automation of data engineering and data operations in the months and years to come. If you take away anything from my first blog, is that the revolution in analytics that is happening now isn’t just about the velocity of data, it is about the velocity and agility with which you can execute new data use cases and projects.