Fundamentally, Infoworks is a natural, market driven response to the challenge of enterprise data operations and orchestration. Let's see how.
Did anyone say big data would be easy? Consider what's involved. 1 Data must be ingested from all your existing systems. And then 2 transformed, tracked, merged, stored, and synchronized in a common data lake. And then 3 deployed as models and cubes for consumption by your reporting, analytics, and machine learning systems. That's all. 4 All you need to do is custom code you type management, change capture, incremental ingestion and transformation pipelines, track merging data lineages over time, design and optimize your models and cubes, scale a cluster, manage development to production migrations, govern a data catalog across a user base, and ensure all data is validated and available. No problem, right? I mean, there's a whole zoo full of tools, patterns, and techniques available these days, and brilliant technologists are able to work miracles with them. But there's an important question to ask here: 5 would you rather write repetitive plumbing code, or solve business problems?
Data engineering is 80% of the effort in enterprise data ops and orchestration, but maybe you would still say yes to this challenge. Tech monsters like Google, and online gaming giants like Zynga certainly do. Those are where the automation processes driving Infoworks get their DNA. Or, 1 maybe you'd rather just have the agility of such automated, standardized, flexible solutions available to meet your own recurring data engineering needs, 2 letting your staff launch new use cases 10 to 100 times faster, with one-tenth the resources, while retaining the flexibility to port to new cloud vendors and underlying technologies, as needed. 2 Maybe you'd prefer the simplicity of a code free, configuration focused approach to all your fundamental data engineering operations, 4 letting you expand the capabilities of a broader range of talent on your staff. 5 Maybe you'd appreciate the productivity gains realized by reducing the effort spent on forcing your high value talent 6 code variations of the same basic ingestion and transformation pipelines, over and over. 7 That's why we're here.
Architecturally, DataFoundry delivers this efficiency by providing a platform empowering your staff to transform data, from virtually any source, into information, for virtually any target. 1 We provide an abstraction layer over your data storage and query execution technologies, 2 we automate your data ingestion and change data capture processes, 3 we provide robust data cataloging and metadata tracking capabilities, 4 we enable clean, portable, powerful data pipeline design, and 5 we drive the results into accelerated data models supporting all your reporting, automation, and analytical needs. 6 All this is centrally governed for roles and capabilities, across a carefully tracked data lineage, 7 with orchestration of all related production operations, in coordination with your particular security and authentication frameworks. 8 We also support robust APIs enabling close coordination with third party data engineering applications.
From a 1 process flow standpoint, 2 Infoworks is a platform enabling you to 3 ingest and synchronize data from many different sources, 4 combine and transform that data, and 5 optimize it to meet varying service levels, whether that means 6 stored data models or 7 models accelerated by one of many different strategies. 8 These capabilities are securely wrapped within a framework to orchestrate underlying operations, track metadata, enable data cataloging, and 9 establish appropriate governance domains, controlling access by appropriate tools and roles. 10 Remarkably, this entire platform abstractly rides upon your preferred cloud and query execution engine, letting you maintain a consistent data platform while retaining flexibility over your chosen vendors.
In a real-life deployment, a major North American retailer needed a fast and consistent approach to digital transformation. 1 Before bringing DataFoundry into their mix, they maintained a highly complex data architecture using fully on-premise hardware. 2 This patchwork offered incomplete metadata and a limited catalog, offering multiple, sometimes inconsistent version of truth, with uncertain underlying data quality. 3 Security was limited, and there was no way to consistently apply data governance policies. 4 Yet with DataFoundry, they onboarded 160 data sources within a single year, 5 designing four to five new transformation pipelines each day to meet existing and newly discovered use cases. 6 They were now able to manage hundreds and even thousands of ingestion and transformation jobs each day, 7 responding to dozens of new data use cases each and every month. Of course, digital transformation on this scale takes both ingenuity and muscle.
Traditionally, big data – or at least as big as data used to get – 1 all sat on one big high-end box somewhere. Over the past decade, though, it's become well known that 2 far greater elasticity and parallelization are required, than big boxes can deliver, in order to manage and analyze the data volumes that now make up the real world of enterprise business. Further, these clustered systems themselves cannot remain static, but need to 3 elastically expand in response to demand, and 4 shrink when that demand passes.
The challenge in that first-generation approach to clustering, though, is that you may end up with many nodes sitting idle 1 when you just need two for the job, and then 2 that job is done. Perhaps 3 the next job needs more nodes, then you can 4 dial things back to just your command and control nodes again when it's done. There's a lot of efficiency in this approach, spinning up ephemeral clusters configured specifically for the job at hand. 5 DataFoundry puts the efficiency of ephemeral clusters directly into your hands, by 6 abstracting away the per-job coding complexity into Cluster Templates. Enabling 7 worker types and and maximum node-volumes to be designed for appropriate workload types, and then 8 assigned to appropriate ingestion and transformation jobs. All through a 9 straightforward user interface.
There is a lot going on in this product, but in a nutshell, we automate your 1 ingestion and synchronization of data from any number of sources into a common data lake. 2 We automate the process of combining and transforming this data as needed. 3 We automate optimizing and exposing this data in the manner best suited to your needs for access speed. 4 We automate the orchestration of when and how all these processes occur, to dance nicely with the ever-changing service demands and capacities of your broader technical ecosystem. 5 We automate the installation, configuring, and scaling of your big data analytics, and more importantly, abstract it in a manner which lets you change cloud vendors and evolve underlying technology stacks, as your business needs demand. 6 We streamline and automate the auditing, cataloging, and secure governance of your data across user domains. 7 We automate the export of your transformed and optimized data into external systems, and 8 We maximize the availability of your data, by securely and durably ensuring access to verified copies.
At a high level, 1 migrating to an EDO2 platform take three steps. 2 First, you bring you data onboard. Ingesting historic data, and then 3 synchronizing to keep your data lake up to date. 4 In the process, you design your data governance policies. 5 Second, you prepare your data for ongoing analytical use, transforming it to meet known use cases, and 6 optimizing targeted data models to meet service level requirements. 7 Then, you operationalize your platform by migrating your data pipelines into production system, 8 orchestrating your workflows, and where relevant 9 set up ongoing exports to the rest of your overall data fabric.
So, what have you learned? 1 Yes, you can always roll your own. Libraries and points tools do exist to meet evolving data needs. But not all enterprises want to run their own internal software factory. 2 Automated data engineering let you launch use cases more quickly, 3 expand the productivity of your staff, 4 reduce complexity, and 5 add a layer of abstraction easing the friction of inevitable future technical change. 6 Infoworks standardizes the ingestion, transformation, optimization, orchestration, operation, governance, export, and replication of your data, across and among all your many silos. 7 And it keeps these processes efficient by delivering them over ephemeral clusters. 8 It just takes three steps: onboard your data, prepare it for use, and operationalize your new data platform.
Infoworks is like grease for your wheels. Come on back, and we'll keep showing you how it works.