sqlpostgresqlcubegrouping-sets

SQL CUBE: Every Grouping Combination in One Pass for Cross-Tab Reports

How GROUP BY CUBE computes every column combination at once, per dimension and grand total, and how to read NULL subtotals with GROUPING() and tell CUBE from ROLLUP.

3 min readReferencesql · postgresql · cube · grouping-sets · aggregation · reporting

CUBE is an extension of GROUP BY that, in a single pass, computes aggregates over every combination of the listed columns: each dimension on its own, their intersections, and the whole dataset. It is a ready-made engine for cross-tab reports like "sum by region, by product, by region+product, plus a grand total, in one table."

What CUBE computes

The canonical example: sales broken down by region and product.

SELECT region, product, SUM(amount) AS total
FROM sales
GROUP BY CUBE (region, product);

GROUP BY CUBE (region, product) builds four groupings at once:

  • (region, product) — the detail cells of the cross-tab;
  • (region) — a total per region (any product);
  • (product) — a total per product (any region);
  • () — the empty set, i.e. the grand total over all rows.

For n columns, CUBE produces 2^n groupings. Two columns give 4 sets, three give 8, four give 16. That is the key difference from ROLLUP, which collapses columns right to left only (a hierarchy, n + 1 levels), while CUBE takes all subsets. CUBE (a, b) yields both (a) and (b); ROLLUP (a, b) yields (a) but not (b).

Reading NULL subtotals with GROUPING()

The main gotcha: subtotal rows are marked with NULL in the collapsed columns. If your data already contains NULL (say, a sale with no region), you cannot tell a real NULL from a total marker. The GROUPING() function removes the ambiguity: it returns 1 when a column was rolled up in that row, and 0 when it holds a real value.

SELECT
  GROUPING(country) AS g_country,
  GROUPING(status)  AS g_status,
  country,
  status,
  SUM(amount)       AS revenue
FROM orders o
JOIN users u ON u.id = o.user_id
GROUP BY CUBE (country, status)
ORDER BY g_country, country, g_status, status;

It is cleaner to swap the flags for readable labels right away, so the report needs no footnotes:

SELECT
  CASE WHEN GROUPING(country) = 1 THEN 'All countries' ELSE country END AS country,
  CASE WHEN GROUPING(status)  = 1 THEN 'All statuses'  ELSE status  END AS status,
  SUM(amount) AS revenue
FROM orders o
JOIN users u ON u.id = o.user_id
GROUP BY CUBE (country, status)
ORDER BY country NULLS LAST, status NULLS LAST;

To pick a specific slice of the cube, filter on the GROUPING() bits in HAVING. For example, only the per-country totals (status collapsed, country not):

SELECT country, SUM(amount) AS revenue
FROM orders o
JOIN users u ON u.id = o.user_id
GROUP BY CUBE (country, status)
HAVING GROUPING(country) = 0 AND GROUPING(status) = 1;

CUBE on a real HR example

CUBE fits matrix reports beautifully. Take employee salaries broken down by department and whether they report to a manager:

SELECT
  COALESCE(dept, 'ALL DEPTS')                       AS dept,
  CASE WHEN GROUPING(manager_id IS NULL) = 1
       THEN 'ALL'
       WHEN manager_id IS NULL THEN 'top-level'
       ELSE 'has manager' END                       AS reports_to,
  COUNT(*)                                           AS headcount,
  ROUND(AVG(salary), 2)                              AS avg_salary
FROM employees
GROUP BY CUBE (dept, (manager_id IS NULL))
ORDER BY dept NULLS LAST;

A single query returns at once: the average salary for each "department + level" pair, for each department as a whole, for each level company-wide, and the grand total. Without CUBE you would stitch four SELECTs together with UNION ALL.

CUBE vs ROLLUP vs GROUPING SETS

All three are syntactic sugar over GROUPING SETS. Knowing the equivalents gives you full control:

  • CUBE (a, b) = GROUPING SETS ((a, b), (a), (b), ()) — every combination;
  • ROLLUP (a, b) = GROUPING SETS ((a, b), (a), ()) — only the right-to-left hierarchy;
  • need a non-standard set? Spell it out by hand: GROUPING SETS ((a, b), (b), ()).

Reach for ROLLUP when the columns form a natural hierarchy (year then month then day) and you do not need side subtotals. Reach for CUBE when the dimensions are independent and you want the full matrix. Gotcha: CUBE over many columns explodes exponentially — CUBE over 5 columns is 32 groupings; on large tables that is expensive, so cap the number of dimensions or use explicit GROUPING SETS.

On dialects: CUBE, ROLLUP and GROUPING SETS are supported in PostgreSQL 9.5+. MySQL offers only GROUP BY ... WITH ROLLUP — no CUBE, no explicit GROUPING SETS; you build the full matrix there with UNION ALL. ClickHouse has a WITH CUBE modifier, but row ordering and NULL handling differ from the standard, so when porting a report check the total rows specifically.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer