Getting streams flowing into your data lake is just the beginning. The point of breaking down data silos is to enable transformation. Let's take a look.
Transformation is our next major feature area, enabling you to combine and transform ingested data from multiple sources into more useful forms.
With Ingestion, 1 you are streaming sources into your data lake, where they remain synchronized. With Transformation, 2 you are taking the next step, building pipelines from any number of sources, and massaging them in various ways, to create a new target reflecting the data structure you need most. 3 You can create any number of targets, from any number of sources. 4 Once created, your transformed targets remain as current as you've chosen to configure the incrementation of your ingestion and transformation pipelines. You can attach any external tool you need, to whatever target you desire. 5 including ingested sources, if no transformation is required.
Of course there's nothing new to notion of data transformation, whether hand coded, or 1 visually programmed. 2 But, visual programming can be as slow and complex as hand coding. It's just visual. It's still 3 a specialized skill with a high learning curve, 4 often requiring manual optimizations. 5 It can take weeks or months to develop, test, optimize, and deploy visual designed use cases.
For example, the transformations needed to track slowly changing dimensions over time could be implemented by either visual or hand coding, in any number of ways. Which is the problem. That kind of variety leads to code management challenges over time. 1 Which is why, with Infoworks, slowly changing dimensions are simply a feature you configure in a pipeline. 2 When you remove the opportunity to reinvent wheels, and replace it with configuring a standard approach, what once took weeks may be done in days ... or whole lot less.
DataFoundry takes this approach in several other ways, providing configuration instead of code. 1 Incrementally loading pipelines are automatically build to your specifications, 2 with load time watermarks, 3 audit columns, timestamps, and more.
Further, your investments in 1 hand coded SQL transformations are maintained and leveraged, 2 when you migrate these workloads into DataFoundry. The ANSI SQL syntax and semantics are automatically translated into a visual pipeline, 3 SQL features not available on the target data store are implemented, 4 execution is optimized, 5 target query access is optimized, and 6 all the structures built are self documented for easy maintenance and change over time.
One of the 1 challenges of working to combine large datasets is that they are ... 2 large. When attempting to build pipelines using point tools, data engineers are often left needing to materialize intermediate views to test their work. 3 Infoworks removes this stumbling block by enabling intelligent, interactive data sampling to inform the pipeline configuration process.
Accurate historical report requires 1 tracking your data lineage. Infoworks enables you to implement either 2 Slowly Changing Data Type 1, meaning changed data will be overwritten, or 3 Slowly Changing Data Type 2, meaning changes to an existing field will cause a new row to be written with appropriate metadata to record the change. All with a simple configuration setting.
To help ensure speed and accuracy when 1 designing transformation pipelines, DataFoundry performs 2 automated structural validation to ensure routing is valid, a target or target is configured, aggregates and joins are defined, column references are valid, and more. 3 It also performs automated syntax validation, ensuring imported SQL parses accurately, validating keywords, checking for duplications, and checking syntax for errors. And, 4 it performs automated semantic validation, verifying names and types against metadata, verifying parameter validity, and checking for type mismatches. Just as you would if you had to take the time to manually code and unit test it all.
To help contextualize the way these features impact the bottom line, let's 1 look at some metrics around DataFoundry Transformation. In the field, customers have spent weeks of effort using Big 5 consultants to build pipelines to transform a large volume Teradata source. The project never completed. With DataFoundry, this took one day. 2 Several data engineers at another spent several months building the logic for a reporting dashboard, involving seven sources, seven pipelines, and three cubes. With Infoworks, this took a single engineer four days, with one subject matter expert participating for review. 3 The code for a single moderately complex incremental transformation pipeline can take two weeks to design, code, and test. DataFoundry turns this into a button click. 4 Optimizing the build time of a single transformation pipeline from five hours to under one can also take weeks. On the other hand, DataFoundry automatically builds the SQL execution plan, and identifies areas for optimization. 5 What could be weeks to design and implementing slowly changing dimension code and tables is also automated. 6 Similarly, the weeks needed to design, code, and test logic to implement partitions, indexing, sorting, and bloom filters within a pipeline is also automated, with appropriate configuration settings available. 7 Most importantly, the abstraction layer creating by using standardized, automated processes to build your pipelines means you are future proofing against inevitable technical change. DataFoundry is fully abstracted from both your chosen execution engine, and your on-premise, hybrid, or cloud environment.
So, what have you learned? 1 Ingestion creates sources from which you transform targets using pipelines, though an ingested source could itself be directly targeted by an external tool as well. 2 Visual programming is still programming; instead, DataFoundry provides automation. 3 Automation features include configurable not coded incremental loading, with load time watermarks, 4 automatically added metadata for auditing as well as tracking slowly changing dimensions 5 automatic pipeline generation from imported SQL workloads 6 interactive data sampling to eliminate the need to materialize vast views during development 7 structural, syntactic, and semantic validation, and more. 8 Letting DataFoundry automate the creation of your transformation pipelines using standardized design patterns can turn weeks or months of coding into days or hours of configuration.
Stop and consider how productive you could be, if the repetitive coding went away, and you could focus on analytical problem solving rather than plumbing. Come on back, and we'll keep showing you a way to get there.