Prepare Your Data

Intelligent automation allows you to rapidly and easily prepare data for analytics

Get Started with DataFoundry

Data Preparation

DataFoundry automates preparing data for analytics and optimizing data pipelines for performance. Data preparation with DataFoundry applies intelligent automation to:

  1. Data Transformation – data pipeline design, optimization and incremental updates
  2. Data Modeling – use-case specific optimization of data models with incremental updates

Data Transformation

Capabilities

  • Easy-to-use self-service data transformation designer
  • Work interactively with intelligent data samples for faster pipeline development
  • One-click creation of incremental pipelines
  • Integrate directly with Spark-ML and H2O.ai algorithms, and operationalize ML with your data pipelines and data models
  • Dependency management
  • Integrate with custom Java, Scala and Python transformation logic
  • Migrate SQL workloads into Databricks-native pipelines automatically
  • Collaborate and share data assets with audit tracking

Benefits for Databricks users

Performance and Scale

  • All transformation executed natively on Databricks using Spark (not JDBC or “push-down”), for maximum performance
  • Automated deployment of auto-scaled, on-demand clusters tailored for individual jobs and data sizes for increased efficiency and tuning
  • Easy optimization of Databricks/Spark pipelines through automated use of broadcast joins, repartitioning, sorting and other actions to deal with data skews and volumes

Faster development of pipelines

  • Visual, zero-coding environment to rapidly create data transformation logic
  • Minimize need for expert resources to tune spark pipelines

Extensibility

  • Easily re-use and integrate your Python, Java and other logic into the data pipelines

Data Modeling

Capabilities

  • Create and build target data models in Delta Lake for optimized query performance
  • One-click slowly changing dimension support
  • Easily configure partitions, data hierarchy and other settings to optimize data organization for different usage patterns
  • Infoworks augments Delta Merge/ACID support, by automatically maintaining record versioning, for any time period
  • Export data to other data consumption systems ranging from in-memory, cubes, data warehouses and NoSQL, and keep them incrementally synchronized

Benefits for Databricks users

Faster data access

  • All data models in the data lake are in Delta format for simplified access and query performance
  • Use your preferred data consumption stack wheter in the data lake or elsewhere with continously synchronized data

Simplified access and discoverability

  • All data models stored in Delta tables, are registered in the metastore, allowing SQL access via notebooks or other tools
  • Easily discovered through the data catalog with technical, business and other metadata

Want to learn how Infoworks software automates data operations and orchestration for Databricks?

Contact Us