To help explain the need to better manage Enterprise Data Operations and Orchestration, we'll start by surveying the scope of the challenge.
The challenge is simple enough to see. 1 Business now operates within a rising flood of data, pouring in through every customer and partner interaction. Informed decisions are being made faster than ever before. And, we can either move at the speed of these decisions, or be washed away in the flood.
So, who is winning in this challenge? 1 The agile. Not the biggest, but the most nimble. It's those who 2 can implement solutions for many use cases, 3 addressing the needs of many different users, 4 drawing upon extremely large and rapidly growing data sets, 5 while responding to ongoing and inevitable technological change and evolution over time. 6 To be agile in the face of the data floods, you need the ability to automate your data engineering processes, and 7 reduce managerial stress by implementing end to end data governance within a single system, 8 while still retaining the flexibility to migrate among on-premise, cloud, and hybrid clusters and data management technologies, as business requirements and technical evolution require.
Okay, so why are so many failing, or at least succeeding far more slowly than they'd hoped? This fundamental shift to digital business practice is not an easy problem to solve. 1 The complexity is immense. 2 The primary tool sets still remain mostly point tools requiring your teams to write whole libraries of custom code 3 libraries often written and maintained by outsourced teams, whose projects may lag, or fully fail and re-start, as the consulting teams inevitably change over time. 4 To compound matters, these teams are comprised of very expensive talent, because of the mix of analytical and coding skills involved. Data engineers are hard people to hire, grow, and retain. 5 To make things worse, underneath it all is often a layer of repurposed, legacy ETL technology, requiring long development cycles and relying on specialized, proprietary, vendor-specific skills.
It's like death by 1,000 paper cuts. 1 Just ungestion from all your systems involves change data capture, parallelization, and tracking slowly changing dimensions to support historical analysis, not to mention type conversion from various systems to a common standard. 2 Along the way you're merging and synchronizing from all the various data schema which have evolved over time throughout your enterprise, while at the same time tracking that history, so you can report trends over time. 3 Then with all this data brought to a common lake, the real work begins, building pipelines to transform all this data to the forms you need for reporting, analytics, machine learning, and more. Pipelines often built differently by different teams, potentially with different tools, creating long term management complexity. 4 The end point of these transformations, of course, are the models and cubes you must design to support your reports and anaytics. 5 All of which must be tuned in various ways to run at speed. 6 And, all this while cataloging, tracking, and managing who gets access to which parts of all this data, in which statess, to support industry and regulatory compliance. 7 Then there's all the operational complexity under the hood. How are systems scaling to meet load requirements. How are your pipelines and models being migrated out of development and into production? How are these operations being monitored. Are you able to pause and restart your transformations, and retry from a known state if they fail? When you dial all this out to the broadest view, it's a bit astounding that anyone can make all this work at all.
It all seemed so simple, back in the day. 1 Over the past several decades, business operations have come to be defined by Enterprise Resource Planning, coupled with rigid, old style data warehouses. As online commerce has grown, new types of Operational Data Store have come into play, along with Customer Relationship Management systems. 2 Each of these were backed by their own particular data silos, some of these internal, but more commonly external in recent years, up in the cloud. 3 Over time it became clear that the built in analytical and reporting capabilities of any single system were insufficient, because data is spread across many systems and silos. 4 So, a whole new tooling ecosystem emerged to extract, transform, and load data from one system to another, supporting growing needs for Business Intelligence Reporting and Enterprise Analytics. Of course, these tools all came with their own specialized requirements, forcing the creation of specialized coding and service teams, from which 5 it became normal to expect 6 to 12 month cycles to implement new cross-system use cases. Yes, things generally eventually worked, but the custom coding was slow, the teams were expensive, and the resulting systems were often locked onto monolithic hardware platforms, unable to take advantage of cluster-scale technologies.
Meanwhile, the use cases just kept proliferating. With so many different systems in play, Customer 360 views have become a core requirement. But it doesn't end there, 1 From Robotics Processes all the way out to leading edge 2 Artificial Intelligence and Machine Learning needs, new data requirements keep growing their way onto the enterprise data management stack. 3 The scaling pressure of these needs brought Hadoop into the market and into the Enterprise, followed by Spark, both bringing along a whole zoo full of open source and vendor specific point tools, requiring entirely new skill sets and along with their specialized service and coding teams. Then, as the unsustainable cost of maintaining scalable hardware on premise became clear 4 big data tooling began moving into the cloud, with teams generally choosing either Team Yellow or 5 Team Blue, as appropriate to their skills, but then, by doing so, also potentially locking companies into those cloud vendors, because of all the additional specialized skills needed to work within each of the clouds. And yes, again ... it all works, sort of, eventually 6 but it can also result in a bit of hairball, when viewed from above. And, you still have slow, expensive projects, with some team somewhere custom coding each use case using the latest design patterns for whatever is the current point tooling in use, plus a whole new world of security and governance issues.
Wouldn't it be nice to streamline things? To have all your data ingestion, transformation, optimization, orchestration, operation, export, governance, and replication processes ... happening through a common user interface, 1 standardizing and automating all the repetitive underlying processes, and eliminating the need to run your own private software factory? 2 Particularly if you can run it all over your same tested and trusted systems, but through an abstraction layer allowing the underlying technology to evolve over time with minimal impact, and allowing you to swap cloud vendors through an automated migration process, should business requirements ever create the need. 3 That's Enterprise Data Operations and Orchestration.
So, what's EDO2? 1 It's speed, allowing you to quickly onboard new data sources and implement new use cases, because your teams are configuring standardized processes, rather than custom coding to their latest convention. 2 It's self service, putting no-code tooling into your analysts hands, extending the power and scaling the capabilities of your talent pool. 3 It's governance, providing you the standardized end to end data time based lineage and secure domain control necessary to meet industry and regulatory demands. 4 It's freedom, giving you a seamless abstraction layer enabling underlying technologies and vendors to evolve, without disrupting your data platform. 5 And, it's extensibility, letting you adaptably integrate with any external application able to maintain a data connection.
Infworks delivers you an Agile Data Platform, supporting 1 petabytes of data streaming through thousands of configurable, no-code pipelines, 2 supporting thousands of applications and analytical processes, with 3 operational agility to leverage the productivity of your talent pools, leading to 4 rapid insights into your enterprise wide data, and 5 all the bottom line goodness available to those who swim the currents without getting dashed upon the rocks.
So, what have you learned? 1 legacy data ingegration tools lack the features needed to manage the hockey-stick growth in data loads, 2 while the code-centric complexity of newer tooling is strangling their own potential. 3 Much greater agility is needed to achieve the potential of enterprise-sized multi-sourced data lakes. 4 This agility is achieved by building or buying a common platform to ingest, transform, optimize, orchestrate, operate, export, govern, and replicate all this data and supporting technology, with little or no need for custom code. 5 By having the abstraction layer needed so you can port and operate across any major distributed data infrastructure, and 6 to port and operate on whatever on premise or cloud environment your changing business demands may require, 7 all while integrating with any in-house or third party application able to maintain a standard data connection.
That's Infoworks. Come on back, and we'll keep diving deeper.