WIDTH_BUCKET in SQL: Equal-Width Histogram Buckets and Distributions

WIDTH_BUCKET answers a simple question: which histogram bin does a number land in if you slice a range into equal-width pieces? It is the right tool whenever you need to group a continuous value (order amount, salary, score) into uniform buckets and build a distribution.

Signature and logic

The canonical PostgreSQL call takes four arguments:

SELECT WIDTH_BUCKET(score, 0, 100, 10) AS bucket;  -- bucket 1..10

This says: take the range from 0 to 100, cut it into 10 equal pieces of width 10 each, and return the bucket number for score. A value in 0..10 lands in bucket 1, 10..20 in bucket 2, and so on. The lower bound is inclusive, the upper bound is exclusive.

The key feature is two extra buckets:

if a value is below the lower bound, you get 0 (underflow);
if a value is at or above the upper bound, you get n + 1 (overflow).

You never silently drop rows: everything outside the range is neatly piled up at the edges.

Distributions with GROUP BY

The most common job is a histogram of order amounts. Slice amount from 0 to 1000 into 10 buckets and count each one:

SELECT
  WIDTH_BUCKET(amount, 0, 1000, 10) AS bucket,
  COUNT(*) AS orders,
  ROUND(MIN(amount), 2) AS lo,
  ROUND(MAX(amount), 2) AS hi
FROM orders
WHERE status = 'paid'
GROUP BY bucket
ORDER BY bucket;

Bucket 0 collects refunds and zero amounts, bucket 11 collects whales above 1000. To turn the number into a readable label, compute the edges right in the query:

SELECT
  bucket,
  (bucket - 1) * 100 AS range_lo,
  bucket * 100       AS range_hi,
  COUNT(*)           AS orders
FROM (
  SELECT WIDTH_BUCKET(amount, 0, 1000, 10) AS bucket
  FROM orders
  WHERE status = 'paid'
) t
GROUP BY bucket
ORDER BY bucket;

The same pattern works for salaries: bucket employees.salary into bands and immediately see where most of the headcount sits.

SELECT
  WIDTH_BUCKET(salary, 30000, 150000, 6) AS band,
  dept,
  COUNT(*) AS headcount
FROM employees
GROUP BY band, dept
ORDER BY band, dept;

WIDTH_BUCKET vs NTILE

These two are easy to confuse, but they split data along different axes:

WIDTH_BUCKET cuts the value axis into equal-width segments. Buckets may be empty or overflowing, and that is fine, because the shape of the distribution is exactly what you want to see.
NTILE(n) cuts the row set into n groups of roughly equal size (quantiles). Interval widths float, but each group holds about the same number of rows.

Reach for WIDTH_BUCKET when fixed thresholds matter ("orders 0-100, 100-200..."). Reach for NTILE when you need percentiles ("top 10% of customers by spend"):

SELECT
  user_id,
  SUM(amount) AS lifetime,
  NTILE(4) OVER (ORDER BY SUM(amount)) AS quartile
FROM orders
GROUP BY user_id;

Gotcha: bounds and NULLs

A few traps that burn people:

The upper bound is exclusive. Exactly WIDTH_BUCKET(100, 0, 100, 10) returns 11, not 10. To pull the maximum into the last real bucket, widen the upper limit or use LEAST(amount, 999.99).
A NULL input yields a NULL output, so those rows drop out of the distribution. Filter them explicitly or fold them in with COALESCE.
You can pass the bounds reversed (upper < lower), which flips the numbering so buckets count downward. That is rarely what you want, so double-check the order.

Differences across engines

PostgreSQL and Oracle support the four-argument WIDTH_BUCKET directly.
MySQL has no WIDTH_BUCKET. Emulate it with arithmetic: FLOOR((amount - 0) / 100) + 1, and handle underflow and overflow by hand.
ClickHouse offers roundDown(amount, [0, 100, 200, ...]) and plain floor(amount / 100) for the same effect.

The golden rule: pin min, max, and the bucket count as deliberate business thresholds rather than fitting them to the current data, otherwise two neighboring reports stop being comparable.