Ensuring Data Integrity with Validation and Pipeline Testing

Shuhsi Lin

Shuhsi Lin

Experienced professional with a proven track record in designing scalable and robust data architectures and fostering a strong engineering culture. Skilled in leading high-performance teams to deliver effective data solutions using DataOps principles. Currently focused on enhancing developer experience in a smart manufacturing and AI department

    Abstract

    In the dynamic world of artificial intelligence (AI), data serves as the backbone for decision-making and operational excellence. Ensuring the accuracy and reliability of data through effective validation and ETL data pipeline (Extract, Transform, Load) testing is paramount. I will introduce the essentials of data validation and ETL data pipeline testing, highlighting processes, types, and best practices. We'll explore frameworks like Great Expectations and dbt to automate and enhance data testing. This talk is designed to equip data engineers and scientists with the knowledge to implement robust data validation strategies, understand the nuances of ETL testing, and maintaining good data quality, and ensuring data pipelines are error-free and efficient.

    Description

    Video

    Location

    R0

    Date

    Day 2 • 05:00-05:30 (UTC)

    Language

    Chinese talk w. English slides

    Level

    Intermediate

    Category

    Testing