Q&A with Caleb Benningfield, Head of Lakehouse Strategy at Amperity – Lakehouse architecture and open sharing are shaking up how brands handle customer data. By eliminating data silos, supercharging data integration and analytics, and massively improving data quality and governance, these systems can ease data sharing and integration across various tools and platforms, simplifying and streamlining data engineering work.
We speak with Caleb Benningfield, Head of Lakehouse Strategy at Amperity to uncover why ‘data lakehouses’ are becoming the next big thing for big brands across APAC.
“What is lakehouse architecture?“
Lakehouse architecture is the perfect blend of a data lake and a data warehouse, offering a unified platform that supports a wide range of data processing and analytics needs. It’s versatile enough to handle both structured and unstructured data. Unlike the old-school setups, lakehouses use open formats like Delta Lake and Apache Iceberg, allowing different tools and platforms to easily access data stored everywhere. This setup tackles the big headaches of data management, like the hassle and cost of juggling multiple environments and migrating data between them.
Imagine being able to store data in various locations and access it seamlessly through applications via shares—that’s the magic of lakehouse innovations like Delta Lake and Apache Iceberg. Brands can also create the best tech stacks without compromising on data quality, leading to more efficient data management and top-notch customer experiences.
But the real magic? Lakehouse architecture ensures your data remains accessible and usable across many tools without constant copying and moving. And in today’s fast-paced digital world, this seamless access to timely and accurate data insights is a game-changer for business decisions and customer satisfaction. You might say, everyone’s jumping in the lake and the water is fine!
“How does lakehouse architecture solve data silos?“
Traditional data setups often create data silos due to the use of multiple big data environments. These silos can lead to inefficiencies and complicate data management due to the separate storage and the work to copy data between them. For instance, a brand might use different platforms for various purposes—one for database workloads, another for artificial intelligence (AI) and machine learning (ML), a separate one for marketing data, and another for financial data. This results in significant data fragmentation. Managing and consolidating these environments typically requires complex migration processes, which can drain IT resources and compromise the effectiveness of each tool.
Lakehouse architecture addresses these issues by enabling live data sharing without the need for complex ETL processes. Data stored in open formats like Delta Tables, Iceberg Tables, and Parquet files can be shared across platforms without copying, allowing for seamless cross-platform workflows and reducing time lost in data transfers. By using open formats and standardized protocols, lakehouse architecture allows for consistent data sharing across different platforms and tools, helping to maintain high data quality and reducing discrepancies that can arise from using multiple, isolated data systems.
Consider a retail company using separate systems for inventory management, CRM, and marketing analytics. Each system holds valuable data, but accessing and integrating this data across platforms can be a logistical nightmare. With a lakehouse architecture, data from all these systems can be stored in a unified, accessible format, allowing seamless data integration and real-time analytics.
“Why do tech stacks need lakehouse architecture?“
Another key advantage of lakehouse architecture is its flexibility in building ideal tech stacks. Brands can pick the best tools for specific tasks without worrying about data replication or quality issues. IT teams can select the optimal tools for their needs, using different platforms’ strengths while maintaining data integrity and quality through open formats and protocols. This means all applications consuming the data simultaneously benefit from the improved data quality, enhancing overall performance and efficiency.
Open formats and protocols in lakehouse environments ensure persistent data quality across the tech stack. Data can be shared through a lakehouse catalog and accessed by any tool using the same architecture. This eliminates the need to copy data from one tool to another, maintaining data integrity and reducing the cost and effort associated with data processing and storage.
For example, a marketing team might prefer a specific analytics platform for customer insights, while the finance team relies on another tool for financial reporting. With a lakehouse architecture, both teams can access the same underlying data without creating cumbersome data migrations or duplicates, ensuring that each team gets the accurate, real-time data they need to perform their tasks effectively. This unified approach streamlines operations and sets the stage for significantly improving data quality and governance. It’s like a lake vacation where some are fishing, some are jet-skiing, and some are just hanging out on the dock. Everyone experiences their ideal activities, creating a harmonious and quality time for all.
“How is lakehouse architecture relevant in supporting data quality and governance?“
Lakehouse architecture significantly boosts data quality and governance by securely sharing data without replication. This reduces the risk of data breaches and ensures compliance with regulations like GDPR and CCPA. Data managers have clear visibility into data storage, making it easier to handle “right to forget” requests and other compliance requirements.
And, there’s more to the magic. Lakehouse architecture also integrates well with advanced AI applications and personalization. Teams can work with massive data sets across multiple tools and platforms without delays. Traditional ETL processes add latency and slow things down, but in a lakehouse setup, data is accessed and processed in real time. This means faster and more accurate personalization and AI-driven insights.
In this context, personalization means tailoring customer experiences based on individual preferences and behaviors. For instance, a retail company could analyze a customer’s browsing and purchase history in real-time to offer personalized product recommendations. Advanced AI applications can leverage the unified data platform provided by a lakehouse to generate insights that drive this level of personalization. Machine learning models trained on comprehensive data sets can predict customer behavior, optimize marketing campaigns, and enhance overall customer engagement. The ability to quickly process and analyze large volumes of data leads to more precise and impactful personalization efforts.
With data stored in open formats like Delta Tables, Iceberg Tables, and Parquet files, lakehouse architecture enables seamless data sharing across platforms. This interoperability reduces the time and effort needed to move data between systems, leading to more efficient workflows and better use of IT resources.
“Can you explain a real-life example of lakehouse architecture in action?“
Now, let’s look at a real-life business case. Virgin Atlantic, a leading transatlantic airline network, with connections to over 200 cities around the world, is one example of a brand using lakehouse architecture to transform its customer data approach. The innovative airline combines Databricks with a Lakehouse CDP to unify and enrich its vast customer data. This powerful combination democratizes data access, allowing non-technical users to make data-driven decisions quickly and efficiently, maximizing the value of customer data for delivering exceptional travel experiences. Virgin Atlantic has improved its data management efficiency and enhanced customer experiences by leveraging real-time insights and personalized interactions made possible through a lakehouse environment.
As more brands adopt this architecture, the potential to enhance data-driven decision-making and improve customer experiences will grow. Brands looking to stay ahead in the data-driven landscape should consider the benefits of lakehouse and open sharing to enhance customer data strategies.
“Why is lakehouse architecture so transformative?“
Lakehouse architecture is revolutionizing customer data management by eliminating data silos, maximizing flexibility, and improving data quality and governance. Major industry players are already embracing this trend. Snowflake’s launch of Iceberg tables, along with similar initiatives by Azure, Google, Salesforce, and Adobe, underscore the transformative potential of lakehouse architecture.
It sets a new standard for how brands handle customer data, enabling advanced AI applications, personalization, and overall data management efficiency. As the adoption of this architecture continues to grow, the ability to access and integrate data seamlessly across various platforms will become a critical advantage for brands looking to dive ahead in the competitive market.
You don’t have to believe in magic to agree that this transformation and its potential for enhanced efficiency, better decision-making, and improved customer experiences is pretty magical.
About Caleb Benningfield, Head of Lakehouse Strategy at Amperity:
Caleb Benningfield joined Amperity as employee number four, helping to build the company from a stealth-stage concept to a $1B valued unicorn and a leader in the Customer Data Platform industry. At Amperity, he has held a number of technical and customer-facing leadership roles. After serving as the founding member of the Client Services organization, Caleb transitioned to the Product team, serving as Principal Solution Architect and running a dedicated practice building experimental and partner solutions. He leads the Lakehouse strategy to expand on Amperity’s connectivity with cloud data warehouses.
Caleb is passionate about solving the complex challenges inherent to the customer data domain and developing solutions that leverage best-in-breed technology in the most efficient & effective way possible. He is a frequent speaker and blogger on all things relating to identity resolution, data engineering, data architecture, and data operations.
About Amperity
Amperity, the first Lakehouse CDP, delivers the data confidence brands need to unlock growth by truly knowing their customers. With Amperity, brands can build a first-party data foundation to fuel customer acquisition and retention, personalize experiences that build loyalty, and manage privacy compliance. Using patented AI and ML methods, Amperity stitches together all customer interactions to build a unified view that seamlessly connects to marketing and technology tools. More than 400 brands worldwide rely on Amperity to turn data into business value, including Alaska Airlines, DICK’S Sporting Goods, Endeavour Drinks, Planet Fitness, Seattle Sounders FC, Under Armour and Wyndham Hotels & Resorts. For more information, visit amperity.com or follow us on Linkedin, X, Facebook and Instagram.