2019 was a big year across the big data landscape. After starting the year with the Cloudera and Hortonworks merger, we’ve seen massive upticks in Big Data use around the globe, with companies flocking to embrace the importance of data operations and orchestration to their business success. The big data industry is now worth $189 Billion, an increase of $20 Billion over 2018, and is set to continue its rapid growth and reach $247 Billion by 2022.
As quickly as the year began, it’s nearly over, which means it’s time for us to once again put on our thinking caps and make our predictions for 2020. But before we do, let’s take a look back at the trends we predicted, and then saw come true, in 2019.
We started seeing this trend at the end of 2018, but in 2019, the operationalization of big data became much more achievable across the board. Previously, companies could see initial success with data operationalization, but scaling data operations and orchestration proved time-consuming and difficult to maintain.
The introduction of automation frameworks specifically designed to operationalize big data workflows has made going from the initial development of a new analytic use case to putting it into production much simpler. Also, with more CDOs now available to drive change (thanks in no small part to GDPR, which mandates their appointment), we have seen an increasing number of organizations get their companies on board with a singular data vision to move from ad hoc analytics to full operationalization of enterprise-wide big data platforms and analytics at scale.
The consolidation of big data vendors, as measured by the dwindling number of vendors at trade shows like Strata, leveled off in 2019. The number of companies getting acquired or simply disappearing now just about matches the number of new entrants. While fewer vendors are getting funded, more of them are delivering a greater level of innovation and value.
The advanced analytics market is no longer in the mood to tolerate crowds of vendors with little differentiation or providers that aren’t delivering real value or rev gen. Let’s face it. The first wave of big data vendors included many organizations that weren’t building businesses–they were building features. As more of them get rolled up into an integrated stack, it will be up to the next generation of players that come in to transform big data into something that’s truly big.
Another key data analytics trend for 2019 was the increased alignment between traditional analytics with machine learning (ML) and artificial intelligence (AI) analytics. More and more organizations are using ML and AI to augment everyday operational analytics pipelines and normal line of business activities.
In the past, ML and AI were somewhat restricted to what data scientists could evaluate and test before a data engineering team could deploy into production. In fact, in most organizations, you had a traditional BI/analytics team and then a separate team of data scientists and yet another team for data engineering. Those groups and skills sets have now begun to overlap or at least work together in more thoughtful ways.
As data sources become more complicated and AI applications expand, 2020 is set to be another year of innovation and evolution for big data. Read on to get the thoughts of big data and data engineering industry veteran Ramesh Menon, as he presents you his five top thoughts on big data technologies in 2020.
As cloud-based technologies continue to develop, businesses are increasingly likely to desire a spot in the cloud. However, the process of moving your data integration and preparation from an on-premises solution to the cloud is more complicated and time-consuming than most care to admit. In addition to migrating mass amounts of existing data, companies will have to sync their data sources and platforms for several weeks to months before the shift is complete.
This isn’t to say that it’s not worth it to switch to the cloud, but the prevalent trend we see emerging is the use of hybrid deployments. Early adopters of the cloud are seeing the difficulties of moving completely over, and instead are utilizing their cloud storage for dynamic workloads, while on-premises platforms remain highly useful for stable workloads. Another complexity is that most enterprises already have a multi-cloud footprint. In 2020, we expect to see later adopters come to the same conclusion, bringing the hybrid and multi-cloud methodology to the forefront of data ecosystem strategies.
Since arriving on the market, Hadoop has been criticized by many in the community for its complexity. Spark and managed Spark solutions like Databricks are the “new and shiny” player and have therefore been gaining traction as data science workers see the platform as an answer to everything they dislike about Hadoop. Spark and Databricks will be especially lauded for their interactive processing capabilities, as well as its internal memory computation for job scheduling and user-friendly interface, which allows data scientists to process stored data via high-level operators.
While Spark and Databricks resolve some of the issues presented by Hadoop’s data management environment, we expect that those running to Spark or Databricks will quickly find it has its own set of challenges. The bottom line is that you still need to code data pipelines and you still need to operationalize, harden and make your data workflows fully manageable and governable.
In addition, much like Hadoop, running a Spark or Databricks job in a data science sandbox and then promoting it into full production will continue to be fraught with challenges. Data engineers will continue to need more fit and finish for Spark when it comes to enterprise-class data operations and orchestration.
The bottom line is that there are a lot of choices to consider between the two platforms, and organizations will avail themselves of that choice for preferred capabilities and economic value. For companies using Infoworks, it will be even easier to switch between the two options, as our framework supports a multitude of data environments.
People have been talking about digital transformation for years without ever really knowing what it meant. Very often digital transformation was used to describe finding ways to sell the data that was being generated to create new revenue streams. These were generic ideas not specific to any particular business, which is why they went nowhere for most organizations.
What they are now coming to realize is that digital transformation is really about taking a data-driven approach to every aspect of their business in an effort to create a competitive advantage. If you’re a retailer, it might be about providing real-time “next best offer” program offers while customers are in your physical stores or getting more out of your inventory to provide a better online and in-store experience. If you are an oil exploration and production company, it is about using data to perform wellhead drilling adjustments hourly instead of daily to maximize the yield from an oil field.
These are discussions that cut to the core to even very traditional businesses, which are now beginning to correctly identify digital transformation as a means of investing in a data platform that reflects the state of the business and can pivot to support new business models as quickly as they emerge. In the same way that you wouldn’t start a company without ERP or CRM system, the same is now true for data and organizations. To see evidence of this evolution in 2020, look for mentions of data in yearly and quarterly reports and mentions about data and analytics in very business-specific use cases in earnings calls.
Two trends are accelerating the use of ML and AI in data-driven organizations. The first is the continued evolution of the “citizen data scientist” who can use some basic ML and AI algorithms within their data pipelines as those capabilities being to show up in more traditional BI and data integration platforms. The second is the ability of data scientists to use more automated tools to put advanced ML and AI algorithms into production.
In 2020, automation frameworks will allow data scientists to create their own data pipelines that are close to production-ready. This combination of bringing data engineering to data scientists and data science to data analysts will drive an increase in the number of actual ML and AI algorithms that go into enterprise-level production.
As we enter the next phase of big data evolution, keep an eye on these big data analytics trends and see how your organization handles the big data landscape. We’ll be back next year to see which predictions we got right, and get you prepared for 2021.
Read more about Infoworks’ EDO2 System and its impact on Enterprise Data Operations and Orchestration!