Speed Up Your Data Processing: Parallel and Asynchronous Programming in Data Science

Abstract

Constantly waiting for your data processing code to finish executing? Through real-life stories, we will explore how to leverage on parallel and asynchronous programming in Python to speed up your data processing pipelines - so that you could focus more on getting value out of your data. While this talk assumes a basic understanding of processes in data pipelines and data science workflows, anyone with a basic understanding of the Python language would be able to understand the concepts and use cases illustrated with analogies.

Description

In any data-intensive application, one of the biggest bottlenecks (in terms of time) is the constant wait for the data processing code to finish executing. Slow code, as well as connectivity issues, affect every step of a typical data science workflow — be it for I/O operations or computation-driven workloads. In this talk, Chin Hwee will be sharing about common bottlenecks in data processing within a typical data science workflow, and exploring the use of parallel and asynchronous programming in the Python Standard Library to speed up your data processing pipelines so that you could focus more on getting value out of your data. Through real-life analogies based on her experience in a young data science team getting started with real-world data, you will learn about: 1. Sequential vs parallel processing, 2. Synchronous vs asynchronous execution, 3. I/O operations vs computation-driven workloads in a data science workflow, 4. When is parallelism and asynchronous programming a good idea, 5. How to implement asynchronous programming using concurrent.futures to speed up your data processing pipelines

Slides

https://speakerdeck.com/ongchinhwee/speed-up-your-data-processing-parallel-and-asynchronous-programming-in-python

Speaker

Chin Hwee Ong

Ong Chin Hwee is a data engineer, aspiring polymath and Industry 4.0 enthusiast who happens to be interested in things that fly (and stuff that burn to keep things flying). Hailing from a background in aerospace engineering and computational modelling, Chin Hwee has experience working on innovative projects in collaboration with academia and industry partners. Chin Hwee is a contributor to the documentation for pandas 1.0 and enjoys sharing her experiences at meetups and conferences.