- Day 1, 16:10‑16:55
- Chinese talk w. English slides
- Python Libraries
Connect "K" of SMACK：pykafka, kafka-python or ?
Apache Kafka is considered as a distributed streaming platform to a build real-time data pipelines and streaming apps. You can also take Kafka as commit log service with functions much like a publish/subscribe messaging system, but with better throughput, built-in partitioning, replication, and fault tolerance and runs in production in thousands of companies. Recently, Kafka has been widely applied as one component of SMACK stack because of it's role connected with Apache Hadoop, Apache Storm, and Spark Streaming in the data pipeline.
In this talk, I will start with introduce data stream processing and the general concept of Kafka's architecture and components by several use cases. Then, Kafka' API will be introduced by python clients with demo. Finally, the benchmark, comparison and limitation of different python clients will be discussed.
本演講將透過使用案例介紹Apache Kafka 的基本架構和組成概念，並藉由python client 的套件說明其API的使用，最後比較不同python client的差異和限制。
A data engineer and python programmer. Currently working on various data applications in a manufacturing company.
Research interests: IoT applications, data streaming processing, data analysis and data visualization.