Prepare Your Data

Intelligent automation allows you to rapidly and easily prepare data for analytics

Take a Free Test Drive

Data Preparation

DataFoundry automates preparing data for analytics and optimizing data pipelines for performance. Data preparation with DataFoundry applies intelligent automation to:

  1. Data Transformation – data pipeline design, optimization and incremental updates
  2. Data Modeling – use-case specific optimization of data models with incremental updates

Data Transformation

Capabilities

  • Easy-to-use self-service data transformation designer
  • Work interactively with intelligent data samples for faster pipeline development
  • One-click creation of incremental pipelines
  • Integrate directly with Spark-ML and H2O.ai algorithms, and operationalize ML with your data pipelines and data models
  • Dependency management
  • Integrate with custom Java, Scala and Python transformation logic
  • Migrate SQL workloads into Databricks-native pipelines automatically
  • Collaborate and share data assets with audit tracking

Benefits for Databricks users

Performance and Scale

  • All transformation executed natively on Databricks using Spark (not JDBC or “push-down”), for maximum performance
  • Automated deployment of auto-scaled, on-demand clusters tailored for individual jobs and data sizes for increased efficiency and tuning
  • Easy optimization of Databricks/Spark pipelines through automated use of broadcast joins, repartitioning, sorting and other actions to deal with data skews and volumes

Faster development of pipelines

  • Visual, zero-coding environment to rapidly create data transformation logic
  • Minimize need for expert resources to tune spark pipelines

Extensibility

  • Easily re-use and integrate your Python, Java and other logic into the data pipelines

Data Modeling

Capabilities

  • Create and build target data models in Delta Lake for optimized query performance
  • One-click slowly changing dimension support
  • Easily configure partitions, data hierarchy and other settings to optimize data organization for different usage patterns
  • Infoworks augments Delta Merge/ACID support, by automatically maintaining record versioning, for any time period
  • Export data to other data consumption systems ranging from in-memory, cubes, data warehouses and NoSQL, and keep them incrementally synchronized

Benefits for Databricks users

Faster data access

  • All data models in the data lake are in Delta format for simplified access and query performance
  • Use your preferred data consumption stack wheter in the data lake or elsewhere with continously synchronized data

Simplified access and discoverability

  • All data models stored in Delta tables, are registered in the metastore, allowing SQL access via notebooks or other tools
  • Easily discovered through the data catalog with technical, business and other metadata

Want to learn how Infoworks software automates data operations and orchestration for Databricks?

Contact Us