sqlpostgresqlaggregationanalytics

PERCENTILE_DISC in SQL: Discrete Percentiles Without Interpolation

How PERCENTILE_DISC returns an actual value from your data without interpolation, how it differs from PERCENTILE_CONT, and when to pick the discrete variant.

2 min readReferencesql · postgresql · aggregation · analytics · statistics

PERCENTILE_DISC returns a percentile as one of the values that actually exists in your data, with no interpolation between neighbors. That makes it the right choice whenever "the average of two rows" is meaningless: for categories, discrete prices, and any case where the result must be a genuine value from the table.

Basic syntax

PERCENTILE_DISC is an ordered-set aggregate, so it uses the special WITHIN GROUP (ORDER BY ...) syntax. The argument in parentheses is a fraction between 0 and 1.

-- Median order amount as a real value from the data
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS median_amount
FROM orders;

-- 90th percentile of salaries
SELECT PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY salary) AS p90_salary
FROM employees;

How the value is picked: rows are sorted by the ORDER BY clause, then the first value whose cumulative row fraction reaches the target is returned. The result is always equal to one of the column's values.

Discrete vs continuous

PERCENTILE_CONT interpolates: with an even number of rows the median is the average of the two middle ones. PERCENTILE_DISC does not interpolate; it hands back a real value. Compare them in one query:

-- For amounts 100, 200, 300, 400:
-- CONT median = 250 (interpolated), DISC median = 200 (real value)
SELECT
  PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY amount) AS disc_median,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS cont_median
FROM orders;

Key differences:

  • DISC always returns a value present in the data; CONT may return a number that is not in the table.
  • CONT requires a numeric type (or interval). DISC also works with text, dates, and enums, any sortable type.
  • On large samples their results converge, but on small ones they diverge noticeably.

When to choose discrete

Reach for PERCENTILE_DISC when an in-between value is meaningless or invalid:

  • Categorical and text data: a "median" status or country must be a real string, not an invention.
  • Discrete amounts and prices where "250.5" cannot occur.
  • SLA and reporting where you must show a concrete, observed value.
-- Median country code per status: must be a real value, not interpolated
SELECT o.status,
       PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY u.country) AS median_country
FROM orders o
JOIN users u ON u.id = o.user_id
GROUP BY o.status;

Here PERCENTILE_CONT simply will not run: you cannot interpolate country strings.

Several percentiles in one pass

In PostgreSQL you can pass an array of fractions and get an array of results from a single call, which is cheaper than three separate queries because the sort happens once.

-- p50, p90, p99 in a single ordered pass
SELECT PERCENTILE_DISC(ARRAY[0.5, 0.9, 0.99])
         WITHIN GROUP (ORDER BY amount) AS p
FROM orders;
-- p -> {200, 980, 1500}

It is handy to spread the array into columns right away:

SELECT p[1] AS p50, p[2] AS p90, p[3] AS p99
FROM (
  SELECT PERCENTILE_DISC(ARRAY[0.5, 0.9, 0.99])
           WITHIN GROUP (ORDER BY amount) AS p
  FROM orders
) t;

It pairs nicely with GROUP BY when you need percentiles per segment:

SELECT dept,
       PERCENTILE_DISC(ARRAY[0.25, 0.5, 0.75])
         WITHIN GROUP (ORDER BY salary) AS quartiles
FROM employees
GROUP BY dept;

Gotchas and engine differences

  • NULL in the ORDER BY is ignored when computing the position, just like in other aggregates. That is usually what you want, but check if NULLs are meaningful.
  • You cannot replace WITHIN GROUP with a plain ORDER BY inside the parentheses; that is a syntax error.
  • MySQL has no PERCENTILE_DISC at all; percentiles are emulated with window functions and NTILE/ROW_NUMBER.
  • In ClickHouse the equivalent is quantileExact(0.5)(amount): it also returns a real value from the data, whereas quantile() interpolates.
-- ClickHouse: exact (discrete) vs interpolated
SELECT
  quantileExact(0.5)(amount) AS disc_median,
  quantile(0.5)(amount)      AS cont_median
FROM orders;

The deciding rule: if the result must read as "a real row from the table," use PERCENTILE_DISC; if you want a smooth statistical estimate of the center or a tail, use PERCENTILE_CONT.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer