Ensuring Data Integrity with Validation and Pipeline Testing

Shuhsi Lin

Shuhsi Lin

Experienced professional with a proven track record in designing scalable and robust data architectures and fostering a strong engineering culture. Skilled in leading high-performance teams to deliver effective data solutions using DataOps principles. Currently focused on enhancing developer experience in a smart manufacturing and AI department

    摘要

    In the dynamic world of artificial intelligence (AI), data serves as the backbone for decision-making and operational excellence. Ensuring the accuracy and reliability of data through effective validation and ETL data pipeline (Extract, Transform, Load) testing is paramount. I will introduce the essentials of data validation and ETL data pipeline testing, highlighting processes, types, and best practices. We'll explore frameworks like Great Expectations and dbt to automate and enhance data testing in both single-machine and distributed computing environments. This talk is designed to equip data engineers and scientists with the knowledge to implement robust data validation strategies, understand the nuances of ETL testing, and maintaining good data quality, and ensuring data pipelines are error-free and efficient.

    說明

    地點

    R0

    時間

    第二天 • 05:00-05:30 (UTC)

    語言

    中文演講/英文投影片

    層級

    中階

    分類

    測試