Connect "K" of SMACK:pykafka, kafka-python or ?

  • R0
  • Day 1, 16:10‑16:55
  • Chinese talk w. English slides
  • Python Libraries
  • Intermediate

Apache Kafka is considered as a distributed streaming platform to a build real-time data pipelines and streaming apps. You can also take Kafka as commit log service with functions much like a publish/subscribe messaging system, but with better throughput, built-in partitioning, replication, and fault tolerance and runs in production in thousands of companies. Recently, Kafka has been widely applied as one component of SMACK stack because of it's role connected with Apache Hadoop, Apache Storm, and Spark Streaming in the data pipeline.

In this talk, I will start with introduce data stream processing and the general concept of Kafka's architecture and components by several use cases. Then, Kafka' API will be introduced by python clients with demo. Finally, the benchmark, comparison and limitation of different python clients will be discussed.

本演講將透過使用案例介紹Apache Kafka 的基本架構和組成概念,並藉由python client 的套件說明其API的使用,最後比較不同python client的差異和限制。

Talk Detail

本演講將會透過python client libary來介紹 Kafka 的功能,並稍加作比較。 關於Apache Kafka說明: [Apache Kafka](http://https://kafka.apache.org/) 將介紹的第python client套件: 1. [confluent-kafka-python](http://docs.confluent.io/current/clients/index.html) maintained by Confluent 2. [Kafka-python](http://github.com/dpkp/kafka-python) maintained by Dana Powers 3. [pykafka](https://github.com/Parsely/pykafka) maintained by Parse.ly Demo source codes and jupyter notebooks: https://github.com/sucitw/benchmark-python-client-for-kafka

Slides Link

Speaker Information

Shuhsi Lin

A data engineer and python programmer. Currently working on various data applications in a manufacturing company.

Research interests: IoT applications, data streaming processing, data analysis and data visualization.