There are many challenges to designing an enterprise data lake that is scalable, sustainable, and governable while still maintaining its flexibility and agility. Data continues to increase in both volume and variety, complicating data ingestion and integration processes. Externally imposed constraints on data use (such as the plethora of global privacy regulations originating) along with a fundamental need for good data hygiene have forced data lakes to implement processes and controls for data governance and security. As a combination of static and real-time data sources are fed into the data lake, its management and operationalization has become even more complex.
All of this is happening while the underlying distributed computing and file systems upon which data lakes are built continue to grow in variety and complexity. This constant evolution increases the demand for personnel with expertise that is increasingly hard to find and even harder to hire to build and manage these environments.
This TDWI report examines these issues and provides guidance on how to overcome the challenges faced by those responsible for supporting the design and development of a scalable and sustainable enterprise data lake.