ConnectorX: Accelerating Data Loading from Databases to Dataframes [pdf]

gruuya · on Oct 28, 2022

What really struck me here at first was how Pandas read_sql spends so little time on the actual query execution and data transfer, while client side processing is taking up the majority (~85%) of time.

It makes more sense though, once you realise that they're talking about unsaturated networks, and so they can focus on relatively simple optimisation techniques (e.g. query partitioning and zero-copy) to bring about significant speedup in data loading.