I am currently working as a data engineer in the financial industry. In the past, I worked as a one-stop shop for data science(Manufacturing), covering data engineering, ETL, modeling, and deployment. Dedicated to finding the most suitable tool for each need. Keep contributing to open source projects. LIFE IS SHORT. USE PYTHON.
Abstract
Are you using pandas to process data? Do you want to handle a large dataset using pandas? Do you want to develop the Python code on your laptop and run it on Cloud or Kubernetes effortlessly? In this talk, I assume you are familiar with pandas and I will share how to distribute your pandas ETL job by changing few lines of code(even just one).
Description
If you are working in Data Science field, pandas is a fantastic tool for Python users. According to the offical document, pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. However, every tools have limitations. pandas can manipulate small data efficiently because it handle the data in memory, which means it's difficult to process large datasets.
In this talk, I will share two common cases describing how to distribute your pandas ETL job by changing few lines of code(even just one):
References:
Video