sqlpostgresqlstatisticsaggregate

SQL STDDEV: STDDEV_SAMP vs STDDEV_POP and Outlier Detection

How STDDEV_SAMP differs from STDDEV_POP, why bare STDDEV is the sample version, computing mean +/- sd, and flagging outliers.

3 min readReferencesql · postgresql · statistics · aggregate · mysql · clickhouse

Standard deviation measures how spread out values are around the mean: small means the data clusters near AVG, large means it scatters. In SQL this is not one function but a pair: STDDEV_SAMP for a sample and STDDEV_POP for a whole population, and the difference matters more than it looks.

Two functions: sample and population

Both measure spread, but they divide the sum of squared deviations by a different count. STDDEV_POP divides by n (the full population), STDDEV_SAMP divides by n - 1 (a sample, Bessel's correction). Bare STDDEV in PostgreSQL is a synonym for STDDEV_SAMP.

-- sample vs population spread of order amounts
SELECT STDDEV_SAMP(amount) AS sd_sample,
       STDDEV_POP(amount)  AS sd_population,
       STDDEV(amount)      AS sd_bare      -- equals STDDEV_SAMP
FROM orders;

On small sets, dividing by n - 1 noticeably inflates the result versus n; on large sets the gap shrinks toward zero.

  • Use STDDEV_SAMP (or just STDDEV) when the rows are a sample of something bigger: one month of orders as a stand-in for customer behavior.
  • Use STDDEV_POP when the rows are the entire thing you study: the salary spread across every current employee.

mean +/- sd: pairing with AVG

A deviation on its own is useless; you read it next to the mean. The classic report shows AVG plus a "mean plus-or-minus one deviation" band.

-- mean and a one-sigma band per country
SELECT u.country,
       COUNT(*)                          AS n,
       AVG(o.amount)                     AS mean,
       STDDEV_SAMP(o.amount)             AS sd,
       AVG(o.amount) - STDDEV_SAMP(o.amount) AS low,
       AVG(o.amount) + STDDEV_SAMP(o.amount) AS high
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'paid'
GROUP BY u.country
HAVING COUNT(*) >= 2;

HAVING COUNT(*) >= 2 is not cosmetic: on a single row STDDEV_SAMP returns NULL, because the n - 1 divisor collapses to zero. STDDEV_POP on a single row returns 0.

The coefficient of variation (sd / mean) is a dimensionless spread measure that lets you compare groups whose amounts live on different scales.

SELECT dept,
       AVG(salary) AS mean,
       STDDEV_POP(salary) / NULLIF(AVG(salary), 0) AS cv
FROM employees
GROUP BY dept
ORDER BY cv DESC;

Outlier detection via z-score

A z-score is "how many deviations a value sits from the mean": (x - mean) / sd. Rows with |z| > 3 are usually treated as anomalies. It is convenient to compute the statistics with a window function and compare each row against them in one pass.

-- flag order amounts more than 3 sd from the per-country mean
SELECT *
FROM (
    SELECT o.id,
           o.amount,
           u.country,
           (o.amount - AVG(o.amount) OVER w)
             / NULLIF(STDDEV_SAMP(o.amount) OVER w, 0) AS z
    FROM orders o
    JOIN users u ON u.id = o.user_id
    WINDOW w AS (PARTITION BY u.country)
) t
WHERE ABS(z) > 3;

NULLIF(..., 0) guards against division by zero: if every amount in a group is identical the deviation is 0, and without the guard the query errors out. The same trick flags overpaid salaries within a department.

NULLs, types, and engine differences

  • STDDEV_* ignores NULL, just like AVG: nulls enter neither the sum nor the count. So the n in Bessel's correction is the number of NON-NULL values, not the total row count.
  • If no rows survive your filters, every one of these functions returns NULL, not an error.
  • Gotcha: confusing STDDEV_SAMP and STDDEV_POP on small groups. Across 5 rows, dividing by 4 instead of 5 is about a 12% shift in the spread estimate, and your outlier threshold drifts with it. Pick one version and do not mix them in a single report.
  • The result type is double precision, i.e. a float with rounding error. That is usually fine for money, but never test such a result for exact equality.
  • PostgreSQL offers all three names. In MySQL, STDDEV and STD are synonyms for STDDEV_POP (population!), and you must spell STDDEV_SAMP explicitly — the opposite of Postgres. In ClickHouse the functions are stddevSamp and stddevPop.

In short: pick SAMP for a sample and POP for a full population, always read the deviation next to AVG, guard the division with NULLIF, and remember that bare STDDEV means different things in Postgres and in MySQL.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer