Abstract
In this session, we rewind to the pre‑Parquet era—when schemaless text, row‑oriented I/O, and ineffective compression crippled analytics—and then methodically uncovers each design decision that made Parquet the de‑facto columnar format. Illustrate how column‑oriented storage, row‑group statistics, page‑level encodings, and Bloom filters cut data‑scanned by two orders of magnitude. We conclude with a practical comparison of PyArrow, Polars, and DuckDB, demonstrating that modern Python tooling turns these optimizations into one‑liners.