Optimization is a broad notion. Let's take a look at what it means in Infoworks.
Infoworks both provides and integrates a broad range of features to keep a unified data platform fast and flexible, to meet changing service level agreements and response time requirements.
One easily described optimization area relates to aligning data access speeds with corresponding needs, as well as with related compute costs. For example, 1 batch data processing involves operations which may be lower priority, and take minutes or more to complete, while 2 reguarly scheduled reports should complete more quickly. 3 Fast ad hoc reports should be responsive within a few seconds, while 4 dashboards should provide virtually instantaneous response. As you know, though, there are costs to manage when increasing speed and computation requirements.
DataFoundry provides you this choice by letting you 1 target models in storage, for ad hoc queries, batch reports, and other processes where speed is less critical. 2 Or, you can accelerate models in-memory, using a variety of ecosystem-specific approaches, for analytical queries, dashboards, and similar targets needing seconds or sub-second speeds
These targets can be further optimized, through hierarchical partitioning and bucketing, to maximize parallel processing, aligned to your primary query loads. These partitions include pre-calculated min/max values to speed range scanning, and bloom filters to help optimize query plans. File reformatting is automated as needed. And, automatic incremental export to external engines can be configured as well.
If you are already working in the cloud, the ingestion, transformation, optimization, and governance features in DataFoundry can export directly to your chosen cloud data warehouse.
Technologies to accelerate access to data models evolve rapidly. This is where the value of a data platform becomes clear, as it allows you to avoid technological lock-in by abstracting away underlying change and complexity. DataFoundry can support 1 in-memory pre-computed data cubes, as well as 2 a whole range of in-memory and external analytical engines, including BigQuery, Snowflake, PrestoDB, AthenaDB, or Impala, as appropriate for your particular ecosystem. 3 We also support data lake models including Hive, and SparkSQL. 4 And we support a broad range of Data Warehouse and RDBMS tooling, such as Azure SQL Data Warehouse, CosmosDB, and, of course, Oracle and Teradata, amongst others.
As you may be noticing by now, one of the greatest values of a unified data platform is 1 the abstraction layer it provides, future proofing you from the impacts of inevitable technical change. One such vital abstraction is 2 our support for swappable execution engines. Whether you choose Hive, Hive with Tez, Spark, Cloudera Impala, Databricks, swap among these, or swap for something even better invented ahead, your DataFoundry pipelines and models remain available and your staff remains productive.
So, what have you learned? 1 Infoworks lets you match your query speeds with end user need by enabling models in storage, in memory, or in pre-calculated cubes. 2 Your targets can be hierarchically partitioned and bucket, to align with your primary query loads, to maximize parallel processing. These partitions will include min/max values to speed range scanning, and Bloom Filters to aid in query planning. 3 If you are pushing data to the cloud, your can take advantage of all the automation and governance features of DataFoundry, while exporting resulting targets to your chosen cloud data warehouse. And, 4 the fact that Infoworks abstracts not only your storage and clustering, but your execution engine, future proofs your analytics, reporting, and AI/ML processes against inevitable technical evolution ahead.
There are a lot of moving parts in the expanding data ecosystem. Infoworks can help you stay in control.