This Eckerson Group report recommends 10 vital steps to attain success in DataOps.READ MORE
Big data comes with more than its fair share of water or water-ish metaphors:
Data lakes. Data swamps. Data oceans. Data flow. Drowning in data. Load balancing.
Okay, perhaps I was thinking too much about my laundry with that last one, but you get the idea, and if you’ll allow me, I have one last one to add:
Yes, this whole time big data has been coming at us in waves. The first wave, I’d say, was borne out of the internet when organizations began digitizing their channels and realizing how much data there was to be had. Around that time, computer scientist Michael Lesk predicted that advances in digital storage would allow us to save every bit of information in the world. Soon, organizations began to look a lot like kids chasing fireflies on a summer night. I was at Netscape/AOL in the early 2000s when we were catching data as fast as we could without really knowing what we were going to do with it–and we were not alone. Data was practically impossible to manage with the BI tools available at the time. Data quality was always a concern, you couldn’t get an enterprise view, and it would take months to move on anything.
And then, like a knight in shining armor (riding a yellow elephant in this case), there was Hadoop. Heralded for its flexibility and ability to process all kinds of data from all kinds of digital sources, Hadoop brought us into big data’s first wave. All of a sudden, there was this technology that a few very tech savvy companies like Yahoo!, Google, Facebook and Amazon were able to take advantage of with their superior engineering talent. Then the Venture Capital (VC) community got into the mix and funded companies like Hortonworks, MapR and Cloudera, which received a heady $4 billion valuation in its last private round.
Unfortunately, that first wave delivered relatively raw technology with which most companies were unable to take advantage. It was simply too complicated and required too much engineering talent to make it work. The VCs viewed this as another opportunity however and invested in a second wave of promising software vendors that provided point solutions to automate specific implementation issues for big data to help fill the complexity gap. After a while, they started to sound like street cart vendors:
“Data ingestion, get your data ingestion software here!!”
“Red hot ad hoc data prep! Get ‘em while they last!”
“High performance BI on big data! Get your big data OLAP cubes while you still can!”
And while this was much better than the hand coding required in the first wave of big data, it still required end users to stitch together multiple tool sets that weren’t designed to work together. Meanwhile, more data continued to pour in at an exponential rate with increasing complexity. Integration issues with all their point solutions began to rear their ugly heads. Organizations couldn’t keep up. They just stood and watched as their data lakes turned into data swamps.
Thankfully, with the third wave, we’ve come to a point where I believe big data will finally hit its full stride. What got us to this point was a surrender of sorts. Big data, it turned out, is hard–whether it is based on Hadoop, Spark, or some cloud “server-less infrastructure”. And so what we’re seeing now is billions of dollars of VC investment going into companies that can make big data easier by rethinking the entire process and automating the entire end-to-end data pipeline from data ingestion, to transformation, to high-speed consumption by a BI tool, to productionalization and management of operational data flows. With more big data automation (like Infoworks brings to the entire data pipeline process, enabling agile data engineering) combined with the “faster, cheaper, more flexible” mantra of the cloud, data-driven organizations are going to start churning out new innovations at dizzying speed. Additionally, due to the added ease of use brought on by the ability to hide big data complexity through automation, organizations with less sophisticated IT groups can now use big data for competitive advantage.
In fact, the biggest defining factor for big data’s third wave will be the substance over the hype. Organizations will be moving more projects into production and having much more to say about what they’ve accomplished instead of what they want to accomplish. How big will big data’s third wave be? If I may tap into one last water metaphor here, it’s going to make one heck of a splash.