Big Data Will Get By (but only with a little help from its friends)

Written by Todd Goldman | Category: Big Data

What would you think if I sang out of tune

Would you stand up and walk out on me?

The Beatles song everybody knows could also pass as the wilting rallying cry for Hadoop, whose “tune” does seem out of sorts these days as more people question the framework’s future. Here’s just a sample of what’s being said:

  • “We are more consistently hearing that core Hadoop software modules are dated/commoditized.StreetInsider
  • “Is it even necessary anymore?” Alex Woodie of Datanami
  • “Hadoop is not what it used to be.” George Anadiotis of ZDNet

Then again, Mr. Anadiotis also said Hadoop has simply moved on from hype. “The fact that you may not hear about it as much is a good thing,” he later writes.

When Hadoop first came to town, it was enabling the multi-machine processing needs of Yahoo’s search engine. Marko Bonaci, in his excellent post that walks readers through the fascinating history of Hadoop, argues Hadoop essentially saved Yahoo!:

“Their data science and research teams, with Hadoop at their fingertips, were basically given freedom to play and explore the world’s data… New ideas sprung to life… invigorating the whole company.”

Today, Hadoop is powering not only Yahoo!, but Facebook, Twitter and many of the other big names in tech we all know and love or hate or prefer not to say. Many other organizations (and their generous venture capitalists) shared in the dream of mimicking Yahoo!’s success with Hadoop, which soon became the compute and storage platform of choice. The problem? These companies weren’t Yahoo! (or Facebook or Twitter…) with its army of engineers, many of which were eventually spun off to form Hortonworks. For these jaded dreamers, Hadoop simply proved to be too damn expensive and too damn hard.

As I wrote last month, even Cloudera revealed during their quarterly earnings call that they were spending too much money and effort on getting their customers deployed.  Projects that do deploy to production take months and are extremely inefficient. Meanwhile, to properly support a Hadoop project, two data engineers are needed for every data scientist. But such talent is scarce, and the overall problem is getting worse. Not only is stored data growing in volume, it’s becoming increasingly complex, with more variety, sources, environments and users of data being added every day.

But, while Hadoop itself will probably never break out of the “for developers only” mold, there is a growing group of vendors building on top of it and other big data technologies that it can call friends. Together, this next generation of value added providers are helping enterprises overcome Hadoop’s complexity, as well as the complexity of Hadoop follow-on technologies like Spark and upcoming “server-less” distributed cloud solutions.  All of these next generation big data enablement providers are building on top of multiple distributed “operating systems” and allow for faster and more successful production implementation of big data—and yes, these “friends” are well funded. Companies like, yes, Infoworks, which automates the entire data engineering process, so organizations can make their big data dreams a reality by launching projects into production within days instead of months.

But there are plenty of others bringing automation, machine learning, artificial intelligence, and so much more into the fold. Waterline Data, a company out of Mountain View, is now gaining some attention and praise for its automated data cataloging capabilities, which makes it a whole lot easier for organizations to quickly discover, govern and actually use their data for informed decision making. Unravel Data combines machine learning and AI to automatically analyze, troubleshoot and optimize performance and utilization of big data apps. These are just a few of the value added vendors that are making big data easier to use through automation.

The big data world is automating and evolving.  Just as O’Reilly can rename the Strata+Hadoop conference to Strata Data to acknowledge the progression of the space beyond Hadoop to encompass a broader big data theme, Hadoop itself will still remain one star among an increasing number of constellations that are operating in the big data universe. And some of the brightest stars will be in extending that universe through automation that simplifies and hides the underlying complexity that makes big data so powerful.  As a result, the big data space will continue to attract thoughtful innovators and venture capital. We may not hear about “Hadoop” as much, as Mr. Anadiotis wrote. But if we continue to lend big data our ears, it will keep singing its song.


About this Author
Todd Goldman
Todd is the VP of Marketing and a silicon valley veteran with over 20+ years of experience in marketing and general management. Prior to Infoworks, Todd was the CMO of Waterline Data and COO at Bina Technologies (acquired by Roche Sequencing). Before Bina, Todd was Vice President and General Manager for Enterprise Data Integration at Informatica where he was responsible for their $200MM PowerCenter software product line. Todd has also held marketing and leadership roles at both start-ups and large organizations including Nlyte, Exeros (acquired by IBM), ScaleMP, Netscape/AOL and HP.

Eckerson Report: Best Practices in DataOps

This Eckerson Group report recommends 10 vital steps to attain success in DataOps. 

Want to learn more?
Watch 12 minute product demo