Cloudera and Hortonworks are Merging. What Should You Do Next?

Written by Todd Goldman | Category: Big Data

On Wednesday October 3rd, Cloudera and Hortonworks announced, to the surprise of many, that they were merging. And while the timing may have been a surprise, the fact that there is industry consolidation for an open source technology isn’t all that shocking.  After all, how many commercial Linux vendors of note are there today? It is tough for the market to sustain multiple open source platforms for anything. So it really wasn’t a matter of if there would be some consolidation of Hadoop/Spark vendors in the market, it was more a matter of when.

And regardless of whether you like the idea of this merger or not, barring some regulatory interference, it is going to happen.  The more relevant questions at this point are really:

  • What is the impact if I chose the Hortonworks stack and it is replaced with Cloudera or vice versa?  Am I at risk?
  • How does this make things better or easier for organizations that want to implement a modern data analytics platform?

The good news is that Cloudera/Hortonworks has already announced that will support the current HDP and CDH platforms for 3 more years.  At the current pace of innovation, that is already a couple of lifetimes in the data analytics space. Also, they have also stated they will provide some kind of converged roadmap that will ultimately bring the two platforms together. I would also imagine they will come up with professional services offerings to migrate areas of differing technologies, like Atlas to Navigator or Navigator to Atlas, over that time period.  

In addition, the combined organization now has $700 million + in revenue and will ultimately have a much larger combined development organization. That is potentially good news for the user base because instead of having two different technologies for metadata repositories, security, etc., this merger will provide more focus around a base set of capabilities that will be consistent.  From our perspective as a developer of an automated agile data engineering platform that runs on top of both Hortonworks and Cloudera, consolidation of the underlying technologies will certainly make things easier on an application developers like us.

That said, the fact is that you don’t actually need to worry about how long it will take Cloudera/Hortonworks to consolidate the roadmaps or the underlying technologies. IF you implement your data engineering pipelines with an agile data engineering platform like Infoworks, which automates the creation and management of data pipelines from ingestion to consumption, you get portability across both Cloudera and Hortonworks today and in the future. is already partnered with both companies and data pipelines and workflows developed with our platform can run on both HDP and CDH today without the need to recode any data pipeline logic.  In fact, we have customers that have developed data pipelines and OLAP cubes to run on Cloudera on premise, migrated them to run on Hortonworks in the cloud, and even migrated them to run on Google’s GCP, without ever having to rewrite a line of code. To make things even better, the joint customers we have with both Cloudera and Hortonworks are able to implement their data analytics pipelines into production 10x faster than they can with just the basic Hadoop/Spark distributions themselves.  We make both underlying platforms, CDH and HDP, equally better.

So don’t stop your move to big data because of this merger.  In the end, industry consolidation is the natural course of market evolution and can make your company’s move to re-platform your data management technology to a modern data architecture, fast, automated and future proofed against ongoing industry changes like the Cloudera/Hortonworks merger.



P.S.  For those of you confused by the image with this blog post, that is all just in good fun.  The new combined Cloudera + Hortonworks is going to be called “Cloudera”.


About this Author
Todd Goldman
Todd is the VP of Marketing and a silicon valley veteran with over 20+ years of experience in marketing and general management. Prior to Infoworks, Todd was the CMO of Waterline Data and COO at Bina Technologies (acquired by Roche Sequencing). Before Bina, Todd was Vice President and General Manager for Enterprise Data Integration at Informatica where he was responsible for their $200MM PowerCenter software product line. Todd has also held marketing and leadership roles at both start-ups and large organizations including Nlyte, Exeros (acquired by IBM), ScaleMP, Netscape/AOL and HP.

Eckerson Report: Best Practices in DataOps

This Eckerson Group report recommends 10 vital steps to attain success in DataOps.