Data Ingestion 

Take a Free Test Drive

Data ingestion and synchronization into a big data environment is harder than most people think.  Loading large volumes of data at high speed and managing the incremental ingestion and synchronization of data at scale into an on premise or cloud data lake or Databricks Delta Lake can present significant technical challenges.  Plus data ingestion is just the first step of a complete Enterprise Data Operations and Orchestration system.

Infoworks DataFoundry automates data ingestion for batch and streaming

No-Code Ingestion Configuration

Our data ingestion tools provide a no-code environment for configuring the ingestion of data from a wide variety of data sources. Infoworks also uses native connectors when available to provide the highest possible speed of data ingestion.


Data Type Conversion

Data types on relational sources map differently depending on the Hadoop or cloud data storage environment you select. Infoworks automatically handles data type conversions which reduces the errors typical in manual handling of data type conversion. In addition, this automation makes it easy to move data from the data lake or Databricks Delta Lake to other consuming systems without having to recode data types.

Scalable, Parallelized Data Ingestion Process

Infoworks’ automated process parallelizes the ingestion of data into your data lake or Databricks Delta Lake and significantly accelerates the loading of large tables with small ingestion windows, without requiring code development.

Schema Change Detection and Propagation

When new columns are added to source systems, data ingestion processes often break if they aren’t manually updated prior to the change. Infoworks automatically detects source side schema changes, adjusts for those changes and ingests the new columns automatically into the data lake or Databricks Delta Lake.

Synch and Merge of Incremental Data

Infoworks automates log and query-based change data capture and also manages slowly changing dimensions (type I and II).  Infoworks reconciles and merges incremental data at ingestion time with the base data that had previously been ingested.  Our data ingestion tool’s continuous merge capability supports fast ingestion and continuous fresh data availability while keeping the data optimized for downstream query performance.

Streaming Data

Infoworks supports both batch and streaming use cases. Configuration of a streaming data flow is done via a simple menu-based interface with no coding required. Infoworks uses Kafka as the underlying streaming engine and can connect to any data source to stream large amounts of data in real time.

Time Axis Tracking of Current, History and Slowly Changing Dimensions

As part of the synchronization and merge process, Infoworks tracks slowly changing dimensions (SCD) and automatically keeps a history table of prior state data, including the date of any changes as well as tracking any errors that might have occurred in the SCD process.

Data Validation and Reconciliation

Infoworks automatically validates data ingested into the data lake for full and incremental loads coming through change data capture. For all data sources loaded, Infoworks provides:

  • Row count validation to ensure that the row counts between source and target match
  • User specified aggregate data matches between source and target tables

Data Ingestion and Synchronization Metrics

Infoworks DataFoundry has been proven in production customer deployments to perform much better than alternatives. The examples in the table below illustrate the level of performance improvement Infoworks’ customers have obtained.

Ready to Unlock the Value of Your Data?

Take a Free Test Drive