3rd Early Release: O’Reilly Media, Inc., 2025. — 206 p. — ISBN 978-1-098-16576-5.
Data projects are an intrinsic part of an organization's technical ecosystem, but data engineers in many companies are still trying to solve problems that others have already solved. This hands-on guide shows you how to provide valuable data by focusing on various aspects of data engineering, including data ingestion, data quality, idempotency, and more.
Author Bartosz Konieczny guides you through the process of building reliable end-to-end data engineering projects, from data ingestion to data observability, focusing on data engineering design patterns that solve common business problems in a secure and storage-optimized manner. Each pattern includes a user-facing description of the problem, solutions, and consequences that place the pattern into the context of real-life scenarios.
If you write software, you’ve heard about the Gangs of Four’s design patterns 1 and maybe even have been considering them as one of the clean code pillars. And now, you’re probably asking yourself, are they not enough for data engineering projects? Unfortunately, no. Software design patterns are the recipes that you can use to keep an easily maintainable code base. Since the patterns are standardized way to represent a given concept, they’re quickly understandable by any new person in the project.
Throughout this journey, you'll use open source data tools and public cloud services to see how to put each pattern into practice.
You'll learn:Challenges data engineers face and their impact on data systems
How these challenges relate to data system components
What data engineering patterns are for
How to identify and fix issues with your current data components
Technology-agnostic solutions to new and existing data projects
How to implement patterns with Apache Airflow, Apache Spark, Apache Flink, and Delta Lake
Introducing Data Engineering Design Patterns (available)Data Ingestion Design Patterns (available)
Error Management Design Patterns (available)
Idempotency Design Patterns (available)
Data Value Design Patterns (unavailable)
Data Flow Design Patterns (unavailable)
Data Security Design Patterns (unavailable)
Data Storage Design Patterns (unavailable)
Data Quality Design Patterns (unavailable)
Data Observability Design Patterns (unavailable)
Patterns Summary (unavailable)