Onboard Your Data

Establish an enterprise-grade foundation to run analytics at scale

Take a Free Test Drive

DataFoundry Data Onboarding

Data onboarding is the critical first step in operationalizing your data lake.  DataFoundry not only automates data ingestion but also automates the key functionality that must accompany ingestion to establish a complete foundation for analytics.  Data onboarding with DataFoundry automates:

  1. Data Ingestion – from all enterprise and external data sources
  2. Data Synchronization – CDC to keep data synchronized with the source
  3. Data Governance – cataloging, data lineage, metadata management, audit, and history

Data Ingestion & Synchronization

Capabilities:

  • Schema Discovery
  • Data type discovery & mapping
  • High-speed, parallelized data ingestion
  • Change Data Capture (including log-based)
  • Data Validation and reconciliation
  • Schema change handling
  • Time Axis/SCD2 at record level
  • Native paths to sources (e.g TPT for Teradata)
  • Connectors for databases, data warehouses, NoSQL, SaaS apps, APIs, files (CSV, JSON, etc) and more…

Benefits for Databricks users

Performance & Optimization

  • All ingestion is run using Databricks Runtime processing (not JDBC), for better performance
  • Automated deployment of auto-scaled, on-demand clusters tailored for individual jobs and data sizes for easier optimization

Simplified Management

  • Augments Delta Merge support, by automatically maintaining record versioning, for any time period
  • Data is natively stored in Delta Tables
  • Automatically prevents duplicate record errors in Delta merge
  • auto-optimizes Delta Tables to Overcome fragmentation due to multiple small files

Easy Access to Data

  • All Delta tables created by Infoworks data onboarding are registered in the metastore, allowing easy SQL access via notebooks
  • Onboarded data is automatically cataloged

Data Governance

Capabilities

  • Automated cataloging of all data sources that have been crawled, ingested and synchronized
  • Catalog includes data pipeline targets that have been prepared from transformation pipelines and all accelerated data models prepared using Infoworks
  • Discover and search for data sets by name, metadata, and tags
  • Add business descriptions and
    glossary information
  • Review metrics including data
    refresh statistics
  • Automated maintenance of data lineage from source to consumption
  • Audit tracking of all changes to data assets
  • Role-based access control into domains for collaborative data preparation
    Auto stratification of data into raw/ingested zones, integrated and curated zones

Benefits for Databricks Users

Simplified Data Discoverability

  • Automatically maintains a data catalog with business and technical metadata, for all ingested data, for easier data discovery by data engineers, data scientists and others

Built-in Data Governance

  • Provides a single trusted view of all data assets in the Delta Lake
  • Avoids a data swamp, through automated metadata lineage, audit, and governance

Contact Us

Contact Us