Building Distributed System with Celery on Docker Swarm


Chinese talk w. English slides



Python Level


Slides Link


在 Python 的領域中,Celery 是一套著名的 distributed task queue framework,其 Canvas 機制更是在分散式系統上建構複雜處理流程的利器。由於 Celery 能夠在 distributed 的環境下運作,很適合與 Docker Swarm 搭配: 由 Docker Swarm 負責建構 cluster 平台,提供 Celery 所需要的 worker containers,而 Celery 可以在 Docker Swarm 上視需要擴展 worker 與 processes 的數量,並行地完成指定的運算。因此,我在 Docker Swarm 之上架了 Celery ,並且以 Hadoop/MapReduce 領域中的入門程式 "Word Count" 來作驗證。另外,IoT 也是屬於分散式的概念,因此我在 Celery + Docker-Swarm 的環境下,以 Docker containers 模擬位於不同 machines 上的 devices,做分散式的溝通與運算。

In this talk, the basic mechanisms of Celery and Docker-Swarm will be explained. With Docker-Swarm, a cluster will be built upon two Raspberry Pi machines. Hadoop entry-level "Word Count" program will be re-writen in Python and executed parallelly via Celery on cluster. A distributed system modeling nerual-network will also be explained.


藉由這個 talk,希望能引發一些思考: 1. 在 Big-Data 的年代,Python 的使用者要如何因應與處理巨量的資料? 是否一定要依靠 Hadoop、Spark...等異質且複雜的工具環境? 2. Docker 等 Container 的技術正火紅,在處理大數據方面,Docker Swarm 能提供給 Python 的使用者什麼樣的助力? 如果有 [50000個 containers](,與 Celery 搭配起來可以建構什麼樣的環境? 做什麼樣的應用? 3. IoT、Microservices、Serverless-computing 也是很火熱的主題,共同的特徵是其 computing 模式都是分散式的。在 Celery + Docker-Swarm 的環境下,分散式的系統可以如何地架構呢? Hope this talk to induce some thoughts: - Hadoop/Spark are complicated; do Python's users have alternative tools to deal with big-data? - Docker is hot; what does Docker actually mean to us? What kind of computational environment can Docker-Swarm provide? - With Celery and Docker-Swarm, how can a distributed system be built? 參考資料: [Celery on Docker Swarm]( [IoT as Brain]( [One for all, All for one](

Wei Lin

Mostly worked in the fields of marketing, strategy-planning, and project-management; I am fascinated by the elegance of Python, and very interested in Machine-Learning and Data-Science.

策略規劃 是專長,寫程式 是興趣,
Machine-Learning 和 Data-Science 是最愛,
對知識的渴望 與 領悟的喜悅 是我永恆的動力。