There are many data integration point tools that can help you on your big data and cloud journey. The problem? Point tools don’t automate the end-to-end data operations and orchestration process.
The Infoworks difference?
Infoworks provides the only data operations and orchestration system that delivers end-to-end automation for your business’ rapidly growing analytics needs
Infoworks DataFoundry automates data operations and data orchestration for developing and managing big data workflows from ingestion all the way to consumption in cloud, multi-cloud and hybrid environments. Our customers implement data workflows into production 10x faster at 1/10th the cost, thanks to an unprecedented level of automation.
Learn more about our capabilities below.
Infoworks DataFoundry automatically crawls data sources, including flat files, XML, JSON and relational databases, learns the metadata and infers data relationships for the ingested data from external data sources. It also tracks metadata for data sets created using DataFoundry and makes metadata searchable via a data catalog.
Infoworks DataFoundry provides a no-code environment for configuring the ingestion of data (batch, streaming, change data capture) from a wide variety of data sources. We use native connectors when possible to provide the highest speed of data ingestion feasible and ingest source data in a high-performance, parallel process, while automatically preserving data precision.
Source data is continuously synchronized with data in your data lake or Delta Lake ( in the case of Databricks ) using log and query-based methods for change data capture. DataFoundry automatically handles slow changing data and schema changes and supports streaming, batch and incremental modes for data synchronization and export.
A self-service data preparation interface provides interactive, drag-and-drop data transformation capabilities with support for SQL-based and other transformations. Users work interactively with data in a collaborative, suggestion-based interface that reduces or eliminates dependence on IT skills. Machine learning and analytics data pipelines created with our agile data engineering platform can run at scale on all popular big data environments without re-coding.
Significantly accelerate migration of legacy data warehouse jobs through DataFoundry’s automated SQL conversion tool which generates optimized, easily maintained, portable, visual data transformation pipelines.
Developers can include advanced analytics directly into their data pipelines. The platform is directly integrated with advanced analytics libraries for decision trees, clustering (k-means), classification and more. Users can also import trained models from other applications via PMML directly into their data transformation pipelines.
The powerful cube engine enables users to visually design star/snowflake schemas and build high-performance OLAP cubes. Data analysts can drag and drop facts, dimensions, and measures. The data engineering platform then automatically builds a fully pre-aggregated and optimized cube natively on the big data platform, providing sub-second response times to most user queries. An ODBC/JDBC interface is made available to industry standard analytics tools.
Big Data queries are dramatically accelerated by Infoworks through query interfaces that support a variety of analytics use-cases across business intelligence, data science, ad-hoc and batch. For BI analytics, reporting, and dashboard style use-cases, Infoworks cubes provide sub-second and interactive response times. For ad-hoc queries, the in-memory accelerated models provide fast access to granular data for a variety of use-cases, while batch use cases can benefit from the optimized data models. Infoworks provides these multiple layers of query acceleration to deliver the right performance and scalability characteristics for each use-case.
In memory and data lake data models are automatically updated due to upstream changes. Whether the change is the addition of a new source column, change data capture updates of content, or modification transformation logic, changes can be automatically propagated to the downstream data models.
A distributed orchestrator monitors production workloads for cloud, multi-cloud and hybrid environments and makes them fault tolerant, reducing the load on system and production administrators. Migrating from development to production across all deployment environments is a simple single-click operation. Additionally, data lineage is tracked across the end-to-end workflow from ingestion to consumption.
As the number of analytics use cases increases, your ability to manually track all of your data and data pipelines decreases. DataFoundry provides a data catalog that allows you to search metadata about data sources, pipelines and work-flows, relate business and technical metadata and identify the best artifacts for your data project. DataFoundry also tracks end-to-end data lineage so users can trace data elements back to the original source systems and perform downstream impact analysis.
The Infoworks platform provides security integration for user authentication and data security policies. It supports single-sign-on/LDAP integration, Kerberos based authorization and also supports encryption for data in motion and at rest.
Data workflows built with Infoworks DataFoundry run on a wide variety of big data execution engines (Amazon EMR, Azure HD Insights, Google DataProc, Databricks, Cloudera/Hortonworks, MapR) and storage platforms without re-coding. Workflows are not only portable but are also performance optimized across execution and storage environments, on premise and in the cloud. Infoworks DataFoundry runs natively on all of the following platforms so it scales naturally as your environment grows.
Infoworks supports relational, cloud, flat files, cluster and streaming data sources and can quickly add new data sources through our plug-in architecture. If a data source you care about is missing from the list below, don’t hesitate to ask. We can add it quickly!