Use Infoworks to Rapidly Onboard Data Sources Into Databricks
Data onboarding is the critical first step in operationalizing your data lake. Infoworks automates data ingestion as well as the key functionality that must accompany ingestion to establish a complete foundation for analytics. Data onboarding with Infoworks automates:
Data Ingestion
Infoworks automatically crawls data sources, ranging from relational databases and data warehouses such as Oracle, Teradata, SQLServer & others to flat files, XML, and JSON. Learns the metadata and infers data relationships for the ingested data from external data sources as well as for data sets created using Infoworks, making metadata searchable via a metadata repository.
Infoworks ingests source data in a high-performance, parallel process, while automatically creating type mapping to preserve source data precision. Infoworks provides a no-code environment for configuring the ingestion of data into your delta lake via batch, change data capture and data streaming.
Data Synchronization
Infoworks continuously synchronizes source data from enterprise databases, data warehouses, and file sources. Changing data is captured from the source systems using log-based and query-based methods. The changed data is merged with the base data in a high-performance continuous merge process.
Governance and Lineage
Infoworks creates and synchronizes a Data Catalog that can be tagged and searched using the UI. It tracks end-to-end data lineage so users can trace data elements back to the original source systems and perform downstream impact analysis. It also provides audit logs that track who has created or changed raw data and semantic data. It also provides the ability to track changes to data pipelines and workflows that operate on the raw data.
Infoworks supports the creation of users with different levels of user access as well as domains, so administrators can control which users have access to specific data sets. Users within a domain can share data, pipelines, and workflows
Ingestion Automation built into Infoworks enables the fastest way to onboard a data source
Infoworks automates many of the data operations tasks required to onboard a data source. This provides an order of magnitude faster approach to onboard a data source compared to other approaches. Some of the data operations that are automatically handled by Infoworks are:
Benefits of Infoworks native integration with Databricks
Infoworks natively integrates with Databricks. There are a number of benefits of a native Infoworks Databricks integration.
See how you can rapidly onboard a data source using Infoworks.