Announcing support for GROUP BY, SUM, and other aggregation queries in R2 SQL

2025-12-18

11 min read

by Jérôme Schneider

Tags:

Data

Edge Computing

Rust

Serverless

SQL

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Cloudflare announces support for GROUP BY, SUM, and other aggregation queries in R2 SQL, its serverless analytics query engine over R2 Data Catalog.

•Aggregations split into two phases: pre-aggregate computation on worker nodes, then final merge at the coordinator (scatter-gather)
•Pre-aggregates allow horizontal scaling: e.g., count(*) pre-aggregate is a partial row count, avg(value) stores sum and count separately
•Scatter-gather fails for ORDER BY/HAVING on aggregates when grouping by high-cardinality columns, as local top-N results can miss global leaders
•Shuffling solves this via deterministic hash partitioning: each worker routes rows to the same destination worker based on the GROUP BY key hash
•A synchronization barrier ensures all workers finish sending data before any worker computes final aggregates

Related Articles