To solve efficiency issue of our Python application. We started with cProfile to profile function call CPU cost and found it couldn't differentiate call-stacks sharing the same function calls. Asyncio makes the issue worse, since gathered functions all have the event loop as caller. We ended up build our own function call profilers:
* CPU cost profiler: CPU instructions per function call with complete call stack.
* Latency profiler with asyncio support: collect timestamp/latency per function call and yield from await.
* Profiler identifies lru_cache opportunities: functions have high cost and high hit rate of parameters/returns.
We share how we implemented those profilers. After this talk, you'll have learned how to:
* Build profilers by registering a callback function for function calls.
* Handle call stacks in asyncio world.
* Use different timers and traversing through call-stack.
* Implement a CPU profiler, latency profiler lru_cache opportunities profiler.