sqlpostgresqlaggregatepercentile

PERCENTILE_CONT in PostgreSQL: Median and Percentiles with WITHIN GROUP

Compute the median and p95 with a single PERCENTILE_CONT aggregate, interpolation included, compare it to PERCENTILE_DISC, and see why it beats AVG.

3 min readReferencesql · postgresql · aggregate · percentile · analytics · clickhouse

PERCENTILE_CONT is an ordered-set aggregate that returns the value at a given fraction of a sorted set, interpolating between neighbouring points when needed. It is the direct route to the median, to p95 latency, and to any percentile, all within a single SELECT.

The WITHIN GROUP syntax

PERCENTILE_CONT is an ordered-set aggregate, so the sort lives not inside the parentheses but in a separate WITHIN GROUP (ORDER BY ...) clause. The argument in the parentheses is a fraction between 0 and 1.

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount
FROM orders;

Key details:

  • The 0.5 argument is the percentile as a fraction: 0.5 is the median, 0.95 is p95, 0.99 is p99.
  • The ORDER BY is mandatory and decides which column the distribution is built on.
  • The column must be numeric or an interval; for text use PERCENTILE_DISC.
  • NULL values are ignored, just like in ordinary aggregates.

Interpolation: how CONT differs from DISC

The word "continuous" means that when the percentile position falls between two rows, the function returns their linear interpolation rather than one of the actual values. For an even row count, the PERCENTILE_CONT median is the average of the two central values.

-- amounts: 10, 20, 30, 40
-- CONT median interpolates between 20 and 30 -> 25
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS cont_median
FROM orders;

-- DISC returns an actual row value -> 20
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS disc_median
FROM orders;

When to pick which:

  • PERCENTILE_CONT for sums, durations, latency: interpolation gives a smooth value.
  • PERCENTILE_DISC when you need a value that actually exists in the set (a real order amount, or a text category).

Multiple percentiles with an array

Pass an array of fractions and the function returns an array of values in one pass, cheaper than three separate calls and ideal for a latency dashboard.

SELECT
    PERCENTILE_CONT(ARRAY[0.5, 0.95, 0.99])
        WITHIN GROUP (ORDER BY amount) AS p50_p95_p99
FROM orders
WHERE status = 'paid';

The result is a numeric[]; unpack it by index:

SELECT
    pct[1] AS p50,
    pct[2] AS p95,
    pct[3] AS p99
FROM (
    SELECT PERCENTILE_CONT(ARRAY[0.5, 0.95, 0.99])
        WITHIN GROUP (ORDER BY amount) AS pct
    FROM orders
    WHERE status = 'paid'
) s;

p95 by group, and why it beats AVG

PERCENTILE_CONT pairs naturally with GROUP BY: compute the median salary per department or the p95 order value per country in one query.

SELECT
    dept,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
    AVG(salary)                                         AS avg_salary
FROM employees
GROUP BY dept
ORDER BY median_salary DESC;

The mean is sensitive to outliers: one whale contract or one 30-second timeout drags AVG upward, and the metric stops describing the "typical" user. The median and p95 are far more robust.

-- p95 latency by country: the SLO-relevant number, not the average
SELECT
    u.country,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY o.amount) AS p95_amount
FROM orders o
JOIN users u ON u.id = o.user_id
GROUP BY u.country
ORDER BY p95_amount DESC;

Gotcha: PERCENTILE_CONT cannot be written as a plain window function with OVER (...). The sort lives in WITHIN GROUP, and OVER does not attach to it. If you need a percentile as a windowed attribute, wrap the aggregate in a GROUP BY subquery and join it back, or reach for percent_rank().

Differences in other databases

  • MySQL has no PERCENTILE_CONT. Medians and percentiles are emulated with the window PERCENT_RANK() or ROW_NUMBER() filtered by position, which is verbose and lacks honest interpolation out of the box.
  • ClickHouse offers quantile(0.95)(amount) (approximate, fast) and quantileExact(0.95)(amount) (exact). For the median there is a median(amount) alias. The syntax drops WITHIN GROUP: the fraction goes into the function's parameters. One caveat: quantile and quantileTDigest interpolate between neighbouring values, but quantileExact does not interpolate. It returns an actual element from the data, much like PERCENTILE_DISC, so do not reach for it when you specifically want an interpolated p95.
-- ClickHouse: quantileExact returns an actual element (no interpolation)
SELECT quantileExact(0.95)(amount) AS p95_amount
FROM orders
WHERE status = 'paid';

If you do want the interpolated tail value, use quantile(0.95)(amount) or quantileTDigest(0.95)(amount) instead.

PERCENTILE_CONT is the standard, readable, accurate way to get the median and tail percentiles in PostgreSQL. Reach for it on latency metrics and monetary distributions, and keep AVG for cases where outliers genuinely do not matter.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer