The problem with traditional data pipelines based on extract, transform, and load (ETL) tools that populate data warehouses and data marts is that power users quickly bump up against their dimensional boundaries. To answer urgent questions, they are forced to download data from data warehouses into Excel or another desktop tool and combine with data acquired elsewhere. The result is a suboptimal spreadmart.
Today, organizations build modern data pipelines to support a variety of use cases. Besides data warehouses, modern data pipelines generate data marts, data science sandboxes, data extracts, data science applications, and various operational systems. These pipelines often support both analytical and operational applications, structured and unstructured data, and batch and real time ingestion and delivery. This webcast will debate the construction and components of a modern data pipeline. |