PERCENT_RANK and CUME_DIST answer the question "how high does this row stand relative to all the others" and return a number between 0 and 1. They are the right tool for leaderboard percentiles, where the absolute rank (1st, 2nd, 3rd) matters less than the relative share.
PERCENT_RANK is defined as (rank - 1) / (rows - 1), where rank is the value of the RANK() window function and rows is the number of rows in the window. The best row gets 0, the worst gets 1.
SELECT
u.id,
u.name,
SUM(o.amount) AS total_spent,
PERCENT_RANK() OVER (ORDER BY SUM(o.amount) DESC) AS pr,
CUME_DIST() OVER (ORDER BY SUM(o.amount) DESC) AS cd
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'
GROUP BY u.id, u.name
ORDER BY total_spent DESC;
The difference between the two is fundamental:
PERCENT_RANK = (rank - 1) / (rows - 1). The top row is always exactly 0.
CUME_DIST = (rows <= current) / (total rows). The top row is almost never 0; the bottom row is always 1.
CUME_DIST literally answers: "what fraction of participants scored no better than me." A customer with CUME_DIST = 0.1 under a descending sort by spend is in the top 10% of spenders.
Leaderboard percentiles
The common presentation is "you are better than N% of players." That is 1 - PERCENT_RANK, scaled to 100.
WITH board AS (
SELECT
e.id,
e.name,
e.dept,
e.salary,
PERCENT_RANK() OVER (ORDER BY e.salary) AS pr
FROM employees e
)
SELECT
name,
dept,
salary,
ROUND((pr * 100)::numeric, 1) AS percentile,
ROUND(((1 - pr) * 100)::numeric, 1) AS better_than_pct
FROM board
ORDER BY salary DESC;
You can compute the percentile within a group by adding PARTITION BY. The salary is then compared only against peers in the same department:
SELECT
name,
dept,
salary,
PERCENT_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_pr
FROM employees;
Tie handling
PERCENT_RANK uses RANK() under the hood, so equal values get the same result. That is often what you want: two players with the same score share a percentile. But it also creates gaps in the distribution and non-determinism in the sort order.
Gotcha: when ORDER BY values tie, the row order is undefined. PERCENT_RANK itself is stable for the tied group, but any neighboring functions (ROW_NUMBER, LAG) will return different results from run to run. Always add a unique tiebreaker column.
SELECT
u.id,
u.name,
COUNT(o.id) AS orders_cnt,
PERCENT_RANK() OVER (
ORDER BY COUNT(o.id) DESC, u.id
) AS pr
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY orders_cnt DESC, u.id;
Here u.id makes the order fully deterministic. But note: with a tiebreaker there are no more ties, and every row gets a unique rank, which is different behavior from a "true" tie. Choose deliberately.
Edge cases and engine differences
- A single row in the window:
rows = 1, so the formula divides by zero conceptually, but the standard defines the result as 0, and PostgreSQL returns 0.
NULL in ORDER BY participates in the sort; control it with NULLS FIRST / NULLS LAST.
- PostgreSQL: both functions follow the SQL standard.
- MySQL: supported since 8.0; the syntax is identical.
- ClickHouse: historically there is no
PERCENT_RANK / CUME_DIST as window functions, so emulate with rank() and count() over a window: (rank() OVER w - 1) / (count() OVER w - 1).
Rule of thumb: for "rank as a percentage" use PERCENT_RANK (the top is always 0); for "share no better than me" use CUME_DIST. And never ship percentiles without a tiebreaker, or neighboring rows will jump between query runs.
PERCENT_RANKandCUME_DISTanswer the question "how high does this row stand relative to all the others" and return a number between 0 and 1. They are the right tool for leaderboard percentiles, where the absolute rank (1st, 2nd, 3rd) matters less than the relative share.The formula and a basic query
PERCENT_RANKis defined as(rank - 1) / (rows - 1), whererankis the value of theRANK()window function androwsis the number of rows in the window. The best row gets 0, the worst gets 1.SELECT u.id, u.name, SUM(o.amount) AS total_spent, PERCENT_RANK() OVER (ORDER BY SUM(o.amount) DESC) AS pr, CUME_DIST() OVER (ORDER BY SUM(o.amount) DESC) AS cd FROM users u JOIN orders o ON o.user_id = u.id WHERE o.status = 'paid' GROUP BY u.id, u.name ORDER BY total_spent DESC;The difference between the two is fundamental:
PERCENT_RANK=(rank - 1) / (rows - 1). The top row is always exactly 0.CUME_DIST=(rows <= current) / (total rows). The top row is almost never 0; the bottom row is always 1.CUME_DISTliterally answers: "what fraction of participants scored no better than me." A customer withCUME_DIST = 0.1under a descending sort by spend is in the top 10% of spenders.Leaderboard percentiles
The common presentation is "you are better than N% of players." That is
1 - PERCENT_RANK, scaled to 100.WITH board AS ( SELECT e.id, e.name, e.dept, e.salary, PERCENT_RANK() OVER (ORDER BY e.salary) AS pr FROM employees e ) SELECT name, dept, salary, ROUND((pr * 100)::numeric, 1) AS percentile, ROUND(((1 - pr) * 100)::numeric, 1) AS better_than_pct FROM board ORDER BY salary DESC;You can compute the percentile within a group by adding
PARTITION BY. The salary is then compared only against peers in the same department:SELECT name, dept, salary, PERCENT_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_pr FROM employees;Tie handling
PERCENT_RANKusesRANK()under the hood, so equal values get the same result. That is often what you want: two players with the same score share a percentile. But it also creates gaps in the distribution and non-determinism in the sort order.SELECT u.id, u.name, COUNT(o.id) AS orders_cnt, PERCENT_RANK() OVER ( ORDER BY COUNT(o.id) DESC, u.id ) AS pr FROM users u LEFT JOIN orders o ON o.user_id = u.id GROUP BY u.id, u.name ORDER BY orders_cnt DESC, u.id;Here
u.idmakes the order fully deterministic. But note: with a tiebreaker there are no more ties, and every row gets a unique rank, which is different behavior from a "true" tie. Choose deliberately.Edge cases and engine differences
rows = 1, so the formula divides by zero conceptually, but the standard defines the result as 0, and PostgreSQL returns 0.NULLinORDER BYparticipates in the sort; control it withNULLS FIRST/NULLS LAST.PERCENT_RANK/CUME_DISTas window functions, so emulate withrank()andcount()over a window:(rank() OVER w - 1) / (count() OVER w - 1).Rule of thumb: for "rank as a percentage" use
PERCENT_RANK(the top is always 0); for "share no better than me" useCUME_DIST. And never ship percentiles without a tiebreaker, or neighboring rows will jump between query runs.