Eckerson Report: Best Practices in DataOps
This Eckerson Group report recommends 10 vital steps to attain success in DataOps.
READ MOREWhat would you think if I sang out of tune
Would you stand up and walk out on me?
The Beatles song everybody knows could also pass as the wilting rallying cry for Hadoop, whose “tune” does seem out of sorts these days as more people question the framework’s future. Here’s just a sample of what’s being said:
Then again, Mr. Anadiotis also said Hadoop has simply moved on from hype. “The fact that you may not hear about it as much is a good thing,” he later writes.
When Hadoop first came to town, it was enabling the multi-machine processing needs of Yahoo’s search engine. Marko Bonaci, in his excellent post that walks readers through the fascinating history of Hadoop, argues Hadoop essentially saved Yahoo!:
“Their data science and research teams, with Hadoop at their fingertips, were basically given freedom to play and explore the world’s data… New ideas sprung to life… invigorating the whole company.”
Today, Hadoop is powering not only Yahoo!, but Facebook, Twitter and many of the other big names in tech we all know and love or hate or prefer not to say. Many other organizations (and their generous venture capitalists) shared in the dream of mimicking Yahoo!’s success with Hadoop, which soon became the compute and storage platform of choice. The problem? These companies weren’t Yahoo! (or Facebook or Twitter…) with its army of engineers, many of which were eventually spun off to form Hortonworks. For these jaded dreamers, Hadoop simply proved to be too damn expensive and too damn hard.
As I wrote last month, even Cloudera revealed during their quarterly earnings call that they were spending too much money and effort on getting their customers deployed. Projects that do deploy to production take months and are extremely inefficient. Meanwhile, to properly support a Hadoop project, two data engineers are needed for every data scientist. But such talent is scarce, and the overall problem is getting worse. Not only is stored data growing in volume, it’s becoming increasingly complex, with more variety, sources, environments and users of data being added every day.
But, while Hadoop itself will probably never break out of the “for developers only” mold, there is a growing group of vendors building on top of it and other big data technologies that it can call friends. Together, this next generation of value added providers are helping enterprises overcome Hadoop’s complexity, as well as the complexity of Hadoop follow-on technologies like Spark and upcoming “server-less” distributed cloud solutions. All of these next generation big data enablement providers are building on top of multiple distributed “operating systems” and allow for faster and more successful production implementation of big data—and yes, these “friends” are well funded. Companies like, yes, Infoworks, which automates the entire data engineering process, so organizations can make their big data dreams a reality by launching projects into production within days instead of months.
But there are plenty of others bringing automation, machine learning, artificial intelligence, and so much more into the fold. Waterline Data, a company out of Mountain View, is now gaining some attention and praise for its automated data cataloging capabilities, which makes it a whole lot easier for organizations to quickly discover, govern and actually use their data for informed decision making. Unravel Data combines machine learning and AI to automatically analyze, troubleshoot and optimize performance and utilization of big data apps. These are just a few of the value added vendors that are making big data easier to use through automation.
The big data world is automating and evolving. Just as O’Reilly can rename the Strata+Hadoop conference to Strata Data to acknowledge the progression of the space beyond Hadoop to encompass a broader big data theme, Hadoop itself will still remain one star among an increasing number of constellations that are operating in the big data universe. And some of the brightest stars will be in extending that universe through automation that simplifies and hides the underlying complexity that makes big data so powerful. As a result, the big data space will continue to attract thoughtful innovators and venture capital. We may not hear about “Hadoop” as much, as Mr. Anadiotis wrote. But if we continue to lend big data our ears, it will keep singing its song.