A Senior ML/Data Engineer in Gogolook. Currently i am in charge of implementing streaming etl infrastructure and nlp related ml model and application. Having 4+ years experience of data science and data engineering, include NLP and Streaming(micro-batch) ETL design. My research interests include nlp related algorithm model and paper, streaming data pipeline and cloud service. Hope i can contribute something in data world.
摘要
In the current data-driven world, we are often faced with how to process and analyze data effectively and in real time. And streaming processing will be an important application. In addition, the data will have different schemas for different applications and needs. In order to effectively achieve data correctness and availability in the application of streaming, it is necessary to integrate schema verification into the streaming process. In order to achieve this objective, I will start with introducing the concept and use cases or scenarios of streaming process and two services, Apache Kafka and Schema Registry. The Kafka is a message queue system that can handle a large amount of streaming data. And Schema Registry is a service which based on Kafka, it can help us do schema verification during producing data to Kafka or consuming data from Kafka. Lastly, I will share how to use python to integrate these two service to implement a reliable streaming process.
說明
影片
地點
R0
時間
第一天 • 13:30-14:00 (GMT+8)
語言
中文演講/英文投影片
層級
中階
分類
應用