Can Data Agility Save a Retailers Fragility?

Written by Todd Goldman | Category: Data Engineering

One of the big stories in retail right now is the rise of agile retailing. All kinds of retail outlets are starting to use big data to drive trend prediction, greater production efficiency and the ability to respond faster to customer needs and fluctuations in the market. In case you haven’t noticed, the speed at which business models are changing is accelerating. This gives retail organizations one of two options: maintain your competitive edge by modernizing your data operation or die. Forrester Research recently called slow and inaccessible data the “kiss of death”. Regardless of how much data retail organizations have on hand, they must also have real-time, always-on access to that data. The problem is most legacy systems are rife with bottlenecks, complexities, inefficiencies and other impediments.

One of the companies we work with started as an “old-fashioned” brick and mortar retailer. But they were smart and realized years ago that they were facing an existential threat from the emergence of online retailers like Amazon. They invested early in online initiatives, which now account for over $1B per year in revenue. But, in 2018 they realized that in order to keep up competitively, they needed to more aggressively expand and improve the value of their customer loyalty program. And while they could already look at what a customer purchased in their physical stores, analyze it and make a complementary offer, that offer wouldn’t arrive in the customer’s email until the next day.

Their goal was to present the customer an enticing offer on the spot, before they even left the store, and provide more or less the same kind of experience they already offered their customers in an online experience. This meant tapping decades of customer purchase history information, store inventory data, and click-stream data from the retailer’s website—all during the few seconds just after the customer has completed a purchase—and then send them a text message with that offer before they’ve even walked out of the store. Since the retailer’s environment was based on a hybrid combination of on-premises Hortonworks, Hadoop, and various cloud implementations, implementing an end to end data pipeline for any analytics use case was a complex task. Loyalty data resided in Microsoft Azure while inventory and transaction data came in from various on-premises systems, with the final analytics performed on Google BigQuery. The company was simply not set up to turn on a dime in support of this kind of analytics use case.

To enable faster time to deployment for continuously evolving analytics demands from the business, the retailer invested in writing their own automation layer on top of their various big data fabrics to deal with capabilities that were not available out of the box. Capabilities they had to build out included managing change data capture, parallelization of data ingestion, merge and synch of changed data after it was ingested, the creation of data transformation logic and managing the data workflows that had to run across multiple hybrid data environments. And while hand coding this framework helped fill many of the big data gaps, the implementation was brittle and they soon discovered they were spending more time troubleshooting their framework than actually using it. The automation framework worked only when building analytics use cases that closely resembled previously built use cases, but it was fragile. It broke when new use cases or datasets were introduced that deviated from the previous use cases. In a nutshell, their infrastructure scaled up to support large volumes and velocity of data, but it didn’t support the ability to add and support a large volume and variety of big data analytics use cases and pipelines.

Driving Agility Into Data Engineering Processes

To drive more agility into their data engineering processes, the company turned to Infoworks’ Agile Data Engineering Platform to provide a single software development and deployment platform that could work across all of their big data fabric environments, without requiring them to recode. In fact, with Infoworks, they didn’t have to hand code anything at all because they are now using a no-code GUI environment to build their data pipelines. Ultimately, Infoworks was used to quickly build cross-platform data pipelines that could consume data from Azure, on-premises, relational databases, and website data sources, transform and prepare the data and move it all onto the Google Cloud Platform where loyalty program analytics views are being created on top of Google BigQuery.

Now, with the agility delivered by the Infoworks-enabled automation layer, the retailer is redirecting the engineering effort they used to use in building and maintaining their own framework to actually delivering new data pipelines in support of new analytics use cases. They have accelerated the creation of manageable end-to-end data pipelines, are driving further innovations in their customer loyalty capabilities, and are continuing on their digital transformation journey. It’s important to note that digital transformation isn’t merely about achieving a certain level of modernization and then saying mission accomplished. There is no finish line. The point of digital transformation is achieving the agility to continuously change and even completely change direction if needed, whenever the business needs to. So the use of an agile data engineering platform to enable the rapid addition or change of data pipelines in support of new analytics use cases is critical for their continued evolution.

It’s a transformation that all retail organizations must face if they are to survive and prevail given the rapid pace of change in today’s world.


About this Author
Todd Goldman
Todd is the VP of Marketing and a silicon valley veteran with over 20+ years of experience in marketing and general management. Prior to Infoworks, Todd was the CMO of Waterline Data and COO at Bina Technologies (acquired by Roche Sequencing). Before Bina, Todd was Vice President and General Manager for Enterprise Data Integration at Informatica where he was responsible for their $200MM PowerCenter software product line. Todd has also held marketing and leadership roles at both start-ups and large organizations including Nlyte, Exeros (acquired by IBM), ScaleMP, Netscape/AOL and HP.

Eckerson Report: Best Practices in DataOps

This Eckerson Group report recommends 10 vital steps to attain success in DataOps.