Practical Data Transformation and Analysis with Pandas

  • R1
  • Day 3, 11:15‑12:00
  • English talk
  • Python Libraries
  • Intermediate

In most large companies, data transformation and analysis are done with SQL. However, most SQL environments don't possess a rich ecosystem like Python and using SQL for complex data aggregation complicates the SQL code and make it not maintainable. By exploiting the power of Pandas and Python, data transformation and analysis can become simple and wonderful jobs.

This talk mainly focuses on performing data transformation and analysis with Pandas. It will start by introducing basic pandas components and how to work with text data. After that, this talk will focuses on how to use split-apply-combine strategy to transform and aggregate data. The final part of this talk will be demo and Q&A.

Talk Detail

目前大部分台灣公司的資料分析,仍然十分仰賴於使用 SQL ,但是將複雜的資料轉置流程全部寫在 SQL code 裡面會讓 SQL code 變得很長,而且由於 SQL 本身不如 Python 一般有彈性,過長的 SQL code 會讓程式碼變得難以理解與維護,因此最好是將複雜的資料轉置流程放在 Python code 裡面,而讓 SQL code 只負責查詢各種轉置前的資料庫資料即可。 本演說將會將從介紹基本會用到的元件如 Series、DataFrame開始,然後進入如何利用 Pandas 裡面的 merge, groupby, agg, transform 來做資料轉置,並解釋這些 Pandas 方法的特性,最後利用 Dataset 來做 demo,讓聽眾了解如何利用 Pandas 來做資料的轉置。

Speaker Information

Zong-han, Xie

Working with various big data solutions in the company and fighting with the MS environment in the company