There are many visual coding tools that can help you on your big data and cloud journey. The problem? Point tools don’t automate the entire data engineering process…
The Infoworks difference?
End-to-end automation is the key to our agile data engineering platform
The Infoworks Autonomous Data Engine automates data engineering and dataops for end-to-end big data workflow processes from ingestion all the way to consumption. Our customers implement to production in days using 5x fewer engineering hours.
Learn more about our capabilities below.
Our agile data engineering platform automatically crawls data sources, including flat files, XML, JSON and relational databases. The solution learns the metadata and infers data relationships for the ingested data from external data sources as well as for data sets created using Infoworks, making metadata searchable via a metadata repository.
Infoworks provides a no-code environment for configuring the ingestion of data (batch, streaming, change data capture) from a wide variety of data sources. We use native connectors when possible to provide the highest speed of data ingestion feasible and ingest source data in a high-performance, parallel process, while automatically preserving data precision.
Source data is continuously synchronized with data in your data lake using log and query-based methods for change data capture. Infoworks automatically handles slow changing data and schema changes and supports streaming, batch and incremental mode of data synchronization and export.
A self-service data preparation interface provides interactive, drag-and-drop data transformation capabilities with support for SQL-based and other transformations. Users work interactively with data in a collaborative, suggestion-based interface that reduces or eliminates dependence on IT skills. Machine learning and analytics data pipelines created with our agile data engineering platform can run at scale on all popular big data environments without re-coding.
Migrating legacy data warehouse jobs is significantly accelerated through automatic conversion of SQL into easily maintained, optimized, portable, visual data transformation pipelines.
Developers can include advanced analytics directly into their data pipelines. The platform is directly integrated with advanced analytics libraries for decision trees, clustering (k-means), classification and more. Users can also import trained models from other applications via PMML directly into their data transformation pipelines.
The powerful cube engine enables users to visually design star/snowflake schemas, and build high-performance OLAP cubes. Data analysts can drag and drop facts, dimensions, and measures. The data engineering platform then automatically builds a fully pre-aggregated and optimized cube natively on the big data platform, providing sub-second response times to most user queries. An ODBC/JDBC interface is made available to industry standard analytics tools.
Big Data queries are dramatically accelerated by Infoworks through query interfaces that support a variety of analytics use-cases across business intelligence, data science, ad-hoc and batch. For BI analytics, reporting, and dashboard style use-cases, Infoworks cubes provide sub-second and interactive response times. For ad-hoc queries, the in-memory accelerated models provide fast access to granular data for a variety of use-cases, while batch use cases can benefit from the optimized data models. Infoworks provides these multiple layers of query acceleration to deliver the right performance and scalability characteristics for each use-case.
In memory and data lake data models are automatically updated due to upstream changes. Whether the change is the addition of a new source column, change data capture updates of content, or modification transformation logic, changes can be automatically propagated to the downstream data models.
A distributed orchestrator monitors production workloads and makes them fault tolerant, reducing the load on system and production administrators. Migrating from development to production is a simple single-click operation. Additionally, data lineage is tracked across the end-to-end workflow from ingestion to consumption.
Data ingestion, transformation, cube generation and workflows built in the Infoworks designer can run in any supported execution environment without recoding. Pipelines are not only portable, but are also performance optimized across execution environments, on premise and in the cloud.
The Infoworks platform provides security integration for user authentication and data security policies. It supports single-sign-on/LDAP integration, Kerberos based authorization and also supports encryption for data in motion and at rest.
Our agile data engineering platform is compatible with a wide variety of big data platforms both on premise and in the cloud. We also provide portability of your data workflows across all of these environments. Infoworks Autonomous Data Engine runs natively on all of these platforms so we scale naturally as your environment grows.
Infoworks supports relational, cloud, flat files, cluster and streaming data sources and can quickly add new data sources through our plug-in architecture. If a data source you care about is missing from the list below, don’t hesitate to ask. We can add it quickly!