sqlpostgresqlwindow-functionspercent-rank

PERCENT_RANK and CUME_DIST: Leaderboard Percentiles in SQL

How to compute a row's relative standing in 0..1 with PERCENT_RANK and CUME_DIST, and how to make the result deterministic.

2 min readReferencesql · postgresql · window-functions · percent-rank · analytics

PERCENT_RANK and CUME_DIST answer the question "how high does this row stand relative to all the others" and return a number between 0 and 1. They are the right tool for leaderboard percentiles, where the absolute rank (1st, 2nd, 3rd) matters less than the relative share.

The formula and a basic query

PERCENT_RANK is defined as (rank - 1) / (rows - 1), where rank is the value of the RANK() window function and rows is the number of rows in the window. The best row gets 0, the worst gets 1.

SELECT
  u.id,
  u.name,
  SUM(o.amount) AS total_spent,
  PERCENT_RANK() OVER (ORDER BY SUM(o.amount) DESC) AS pr,
  CUME_DIST()    OVER (ORDER BY SUM(o.amount) DESC) AS cd
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'paid'
GROUP BY u.id, u.name
ORDER BY total_spent DESC;

The difference between the two is fundamental:

  • PERCENT_RANK = (rank - 1) / (rows - 1). The top row is always exactly 0.
  • CUME_DIST = (rows <= current) / (total rows). The top row is almost never 0; the bottom row is always 1.

CUME_DIST literally answers: "what fraction of participants scored no better than me." A customer with CUME_DIST = 0.1 under a descending sort by spend is in the top 10% of spenders.

Leaderboard percentiles

The common presentation is "you are better than N% of players." That is 1 - PERCENT_RANK, scaled to 100.

WITH board AS (
  SELECT
    e.id,
    e.name,
    e.dept,
    e.salary,
    PERCENT_RANK() OVER (ORDER BY e.salary) AS pr
  FROM employees e
)
SELECT
  name,
  dept,
  salary,
  ROUND((pr * 100)::numeric, 1) AS percentile,
  ROUND(((1 - pr) * 100)::numeric, 1) AS better_than_pct
FROM board
ORDER BY salary DESC;

You can compute the percentile within a group by adding PARTITION BY. The salary is then compared only against peers in the same department:

SELECT
  name,
  dept,
  salary,
  PERCENT_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_pr
FROM employees;

Tie handling

PERCENT_RANK uses RANK() under the hood, so equal values get the same result. That is often what you want: two players with the same score share a percentile. But it also creates gaps in the distribution and non-determinism in the sort order.

Gotcha: when ORDER BY values tie, the row order is undefined. PERCENT_RANK itself is stable for the tied group, but any neighboring functions (ROW_NUMBER, LAG) will return different results from run to run. Always add a unique tiebreaker column.

SELECT
  u.id,
  u.name,
  COUNT(o.id) AS orders_cnt,
  PERCENT_RANK() OVER (
    ORDER BY COUNT(o.id) DESC, u.id
  ) AS pr
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name
ORDER BY orders_cnt DESC, u.id;

Here u.id makes the order fully deterministic. But note: with a tiebreaker there are no more ties, and every row gets a unique rank, which is different behavior from a "true" tie. Choose deliberately.

Edge cases and engine differences

  • A single row in the window: rows = 1, so the formula divides by zero conceptually, but the standard defines the result as 0, and PostgreSQL returns 0.
  • NULL in ORDER BY participates in the sort; control it with NULLS FIRST / NULLS LAST.
  • PostgreSQL: both functions follow the SQL standard.
  • MySQL: supported since 8.0; the syntax is identical.
  • ClickHouse: historically there is no PERCENT_RANK / CUME_DIST as window functions, so emulate with rank() and count() over a window: (rank() OVER w - 1) / (count() OVER w - 1).

Rule of thumb: for "rank as a percentage" use PERCENT_RANK (the top is always 0); for "share no better than me" use CUME_DIST. And never ship percentiles without a tiebreaker, or neighboring rows will jump between query runs.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer