As the field of data analytics and data integration continue to mature, it is increasingly clear that the solution to keeping pace with accelerating demands of business requires a more holistic enterprise-wide view and that automation, end-to-end integration, and infrastructure abstraction are foundational to success. However, one of the challenges we have noticed over the past few years is a lack of a distinct vocabulary that reflects the evolution that has happened in the data analytics, data integration and data management markets. Terms like ETL, Kafka, Hadoop and Spark refer to specific technologies and don’t capture the breadth of the challenge. At the same time, there are new terms like DataOps which refers to a process. While there is a lot of vendor hype about DataOps in particular, many industry analysts have noted that since it is a process, there really is no such thing as DataOps software. As Nick Heudecker of Gartner wrote, “DataOps is a practice, not a technology or tool; you cannot buy it in an application.”
This leaves a gap in how best to describe the requirements for data management when distributed data execution and storage platforms are constantly evolving, and companies are balancing on-premise and cloud data architectures. Enterprise Data Operations and Orchestration (EDO2) is a concept that is meant to directly reflect new ways of thinking about managing data and data pipelines as a critical business process. This concept exists much in the same way that Enterprise Resource Planning (ERP) defined a market to address “deep operational end-to-end processes, such as those found in finance, HR, distribution, manufacturing, service and the supply chain.” (Gartner)
Enterprise Data Operations and Orchestration (EDO2) refers to the systems and processes that enable businesses to organize and manage data from disparate sources and process the data for delivery to analytic applications.
EDO2 systems aim at shorter development cycles, increased deployment frequency, and more dependable releases of data pipelines, in close alignment with business objectives. EDO2 is an integrated software system designed to automate the main steps of data pipeline development and operationalization from source to consumption by analytics applications. Historically, data integration platforms have provided independent modules for each step in the development and management of data pipelines and workflows. In contrast, EDO2 systems integrate these modules into a fully unified system that provides a more holistic and agile environment for delivering data at scale in support of increasing numbers of analytics use cases.
This includes modules and processes (further defined below) for:
EDO2 is not dependent on a specific data processing or integration technology (e.g. ETL, Hadoop, Kafka, Spark etc.), but delivers a semantic layer of systems and processes that provide independence and portability across different data processing and integration technologies in support of shorter development, deployment and dependable release cycles.
Additional characteristics of more advanced EDO2 systems include:
This post represents is a high level description of Enterprise Data Operations and Orchestration (EDO2). Future blog posts will go into more detail on both the benefits and core capabilities of EDO2 systems, so check out this space in future weeks to learn more.
————————
If you are interested in learning more about EDO2 implementations, check out the rest of the www.infoworks.io website.