Rust visitor pattern and efficient DataFusion query federation

zX41ZdbW · on Dec 20, 2022

For comparison, the implementation of the same feature in ClickHouse:

https://github.com/ClickHouse/ClickHouse/blob/master/src/Sto...

ClickHouse allows federated queries with MySQL, Postgres, ODBC and JDBC data sources.

gruuya · on Dec 20, 2022

Nice, looks familiar! Any plans for supporting aggregation pushdowns (we have had some experience with that in Postgres/Multicorn[1][2])?

Though, I imagine there's a region in the data size/network throughput/latency space where simply fetching the data and then doing analytics in ClickHouse is more performant than actually going for the pushdown.[3]

[1] https://www.splitgraph.com/blog/postgresql-fdw-aggregation-p... [2] https://www.splitgraph.com/blog/postgresql-fdw-aggregation-p... [3] https://duckdb.org/2022/09/30/postgres-scanner.html

zX41ZdbW · on Dec 20, 2022

We have tested it on a few queries with aggregation (with low cardinality results). For MySQL, fetching the raw data and doing aggregation in ClickHouse appeared to be faster. For PostgreSQL it was identical (at least Postgres did not do significantly more work for aggregation than for data reading). It also depends on the network, but at least it was not limited by 10 Gbit network.

Automatic pushdown of aggregations is currently not considered, but we consider a syntax to allow explicitly push down a whole subquery.