Integrate Schema Registry & Kafka in Python to Build Streaming Processing

蘇揮原 Mars Su

蘇揮原 Mars Su

A Senior ML/Data Engineer in Gogolook. Currently i am in charge of implementing streaming etl infrastructure and nlp related ml model and application. Having 4+ years experience of data science and data engineering, include NLP and Streaming(micro-batch) ETL design. My research interests include nlp related algorithm model and paper, streaming data pipeline and cloud service. Hope i can contribute something in data world.

    Abstract

    In the current data-driven world, we are often faced with how to process and analyze data effectively and in real time. And streaming processing will be an important application. In addition, the data will have different schemas for different applications and needs. In order to effectively achieve data correctness and availability in the application of streaming, it is necessary to integrate schema verification into the streaming process. In order to achieve this objective, I will start with introducing the concept and use cases or scenarios of streaming process and two services, Apache Kafka and Schema Registry. The Kafka is a message queue system that can handle a large amount of streaming data. And Schema Registry is a service which based on Kafka, it can help us do schema verification during producing data to Kafka or consuming data from Kafka. Lastly, I will share how to use python to integrate these two service to implement a reliable streaming process.

    Description

    Video

    Location

    R0

    Date

    Day 1 • 13:30-14:00 (GMT+8)

    Language

    Chinese talk w. English slides

    Level

    Intermediate

    Category

    Application