SQL NTILE: Quartiles, Deciles and Cohorts in Equal Buckets

Q: How the remainder is distributed

When the row count does not divide evenly by n, the extra rows go into the lower-numbered buckets. For 10 rows and NTILE(4): the remainder is 10 mod 4 = 2, so the first two buckets each get one extra row. - 10 rows, NTILE(4) → 3, 3, 2, 2 - 11 rows, NTILE(4) → 3, 3, 3, 2 - 7 rows, NTILE(3) → 3, 2, 2 Gotcha: when n is larger than the row count, some buckets never appear. Seven rows with NTILE(10) produce numbers 1 through 7, and buckets 8, 9, 10 stay empty. Do not assume every bucket exists. A second trap: equal values (ties) can still land in different buckets, because rows are counted, not values. If a cohort boundary must align with a value boundary, NTILE is the wrong tool — reach for WIDT

Q: NTILE vs WIDTH_BUCKET

They are easy to confuse, but the logic differs: - NTILE(n) equalizes the row count across buckets. Every bucket is about the same size, but the value ranges float. - WIDTH BUCKET(val, lo, hi, n) equalizes the range width . Buckets span equal value widths, but their row counts can vary wildly. Rule of thumb: "equal number of customers per group" → NTILE; "fixed price bands" → WIDTH BUCKET.

NTILE(n) is a window function that splits an ordered set of rows into n buckets as equal in size as possible and tags each row with a bucket number from 1 to n. It is the workhorse behind quartiles, deciles, and any "equal number of customers per group" cohort.

The basic case: quartiles

Split users into four equal groups by total spend. ORDER BY ... DESC means bucket 1 is the "top".

SELECT
  u.id,
  u.email,
  SUM(o.amount) AS spent,
  NTILE(4) OVER (ORDER BY SUM(o.amount) DESC) AS quartile
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'
GROUP BY u.id, u.email;

The key difference from RANK/ROW_NUMBER: NTILE looks at the row count, not at specific values. Ten rows under NTILE(4) produce buckets of size 3, 3, 2, 2 — the function lays the rows out for you.

How the remainder is distributed

When the row count does not divide evenly by n, the extra rows go into the lower-numbered buckets. For 10 rows and NTILE(4): the remainder is 10 mod 4 = 2, so the first two buckets each get one extra row.

10 rows, NTILE(4) → 3, 3, 2, 2
11 rows, NTILE(4) → 3, 3, 3, 2
7 rows, NTILE(3) → 3, 2, 2

Gotcha: when n is larger than the row count, some buckets never appear. Seven rows with NTILE(10) produce numbers 1 through 7, and buckets 8, 9, 10 stay empty. Do not assume every bucket exists.

A second trap: equal values (ties) can still land in different buckets, because rows are counted, not values. If a cohort boundary must align with a value boundary, NTILE is the wrong tool — reach for WIDTH_BUCKET or a manual CASE.

PARTITION BY: deciles within each country

PARTITION BY recomputes the buckets independently inside each group. Below are spend deciles computed per country.

SELECT
  u.country,
  u.email,
  SUM(o.amount) AS spent,
  NTILE(10) OVER (
    PARTITION BY u.country
    ORDER BY SUM(o.amount) DESC
  ) AS decile
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.country, u.email;

Each country gets its own ladder from 1 to 10. The same idea gives salary deciles per department in one query:

SELECT name, dept, salary,
  NTILE(4) OVER (PARTITION BY dept ORDER BY salary DESC) AS pay_quartile
FROM employees;

NTILE vs WIDTH_BUCKET

They are easy to confuse, but the logic differs:

NTILE(n) equalizes the row count across buckets. Every bucket is about the same size, but the value ranges float.
WIDTH_BUCKET(val, lo, hi, n) equalizes the range width. Buckets span equal value widths, but their row counts can vary wildly.

-- Equal-WIDTH bands: 0-250, 250-500, 500-750, 750-1000
SELECT id, amount,
  WIDTH_BUCKET(amount, 0, 1000, 4) AS price_band
FROM orders;

Rule of thumb: "equal number of customers per group" → NTILE; "fixed price bands" → WIDTH_BUCKET.

Compatibility

PostgreSQL — both NTILE and WIDTH_BUCKET are fully supported.
MySQL — NTILE exists since 8.0; there is no WIDTH_BUCKET, so people emulate it with a FLOOR(...) expression.
ClickHouse — no native NTILE; use intDiv(rowNumberInAllBlocks(), ...) or arithmetic over row_number().

The thing to remember: NTILE slices by rows, and the remainder always tilts toward the lower-numbered buckets — that is what makes honest quartile cohorts work.