Implementing Layered Data Architecture In Dagster

George T. C., Lai

A data practitioner with data analysis background who has been developing career mainly in Big Data and DevOps based on cloud-native ecosystem for 12 years. In the recent 7 years, I have been focusing on Data Architect, team management, and DevOps. As to technical experience, I got 6 years on Hadoop ecosystem, especially on Hortonworks HDP, 7 years on Kubernetes and 4 years on AWS/GCP. My personal vision is to make each data practitioner have a better life. I am approaching the vision by exploring new tools, discovering best practices, and delivering well-designed data architectures and technical solutions for data practitioners to relief their pain points and frustrations when coping with data.

Abstract

A layered data design pattern is a modern data architecture for building ETL/ELT data pipelines comprised of multiple stages so that each stage processes the data and improves the quality of the data progressively. Compared to the imperative way how data engineers build ETL/ELT data pipelines in the last decade, layered data architecture could be of great help in improving data quality steadily and progressively, and reducing data silos while project-specific teams are autonomously producing various data products. We will introduce, in this share, a technical solution based on layered data architecture. The solution is implemented by means of Dagster, a cloud-native data orchestrator with integrated lineage, observability, and a declarative programming model. A simple example will be presented in this talk to demonstrate concepts, principles, and data stack of the solution. In the end, the benefits we have gained from the implementation experience will be conveyed as well.

Description

Slides

Location

Date

Day 1 • 16:00-16:45 (GMT+8)

Language

Chinese talk w. English slides

Level

Intermediate