SQL STDDEV: STDDEV_SAMP vs STDDEV_POP and Outlier Detection

Standard deviation measures how spread out values are around the mean: small means the data clusters near AVG, large means it scatters. In SQL this is not one function but a pair: STDDEV_SAMP for a sample and STDDEV_POP for a whole population, and the difference matters more than it looks.

Two functions: sample and population

Both measure spread, but they divide the sum of squared deviations by a different count. STDDEV_POP divides by n (the full population), STDDEV_SAMP divides by n - 1 (a sample, Bessel's correction). Bare STDDEV in PostgreSQL is a synonym for STDDEV_SAMP.

-- sample vs population spread of order amounts
SELECT STDDEV_SAMP(amount) AS sd_sample,
       STDDEV_POP(amount)  AS sd_population,
       STDDEV(amount)      AS sd_bare      -- equals STDDEV_SAMP
FROM orders;

On small sets, dividing by n - 1 noticeably inflates the result versus n; on large sets the gap shrinks toward zero.

Use STDDEV_SAMP (or just STDDEV) when the rows are a sample of something bigger: one month of orders as a stand-in for customer behavior.
Use STDDEV_POP when the rows are the entire thing you study: the salary spread across every current employee.

mean +/- sd: pairing with AVG

A deviation on its own is useless; you read it next to the mean. The classic report shows AVG plus a "mean plus-or-minus one deviation" band.

-- mean and a one-sigma band per country
SELECT u.country,
       COUNT(*)                          AS n,
       AVG(o.amount)                     AS mean,
       STDDEV_SAMP(o.amount)             AS sd,
       AVG(o.amount) - STDDEV_SAMP(o.amount) AS low,
       AVG(o.amount) + STDDEV_SAMP(o.amount) AS high
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'paid'
GROUP BY u.country
HAVING COUNT(*) >= 2;

HAVING COUNT(*) >= 2 is not cosmetic: on a single row STDDEV_SAMP returns NULL, because the n - 1 divisor collapses to zero. STDDEV_POP on a single row returns 0.

The coefficient of variation (sd / mean) is a dimensionless spread measure that lets you compare groups whose amounts live on different scales.

SELECT dept,
       AVG(salary) AS mean,
       STDDEV_POP(salary) / NULLIF(AVG(salary), 0) AS cv
FROM employees
GROUP BY dept
ORDER BY cv DESC;

Outlier detection via z-score

A z-score is "how many deviations a value sits from the mean": (x - mean) / sd. Rows with |z| > 3 are usually treated as anomalies. It is convenient to compute the statistics with a window function and compare each row against them in one pass.

-- flag order amounts more than 3 sd from the per-country mean
SELECT *
FROM (
    SELECT o.id,
           o.amount,
           u.country,
           (o.amount - AVG(o.amount) OVER w)
             / NULLIF(STDDEV_SAMP(o.amount) OVER w, 0) AS z
    FROM orders o
    JOIN users u ON u.id = o.user_id
    WINDOW w AS (PARTITION BY u.country)
) t
WHERE ABS(z) > 3;

NULLIF(..., 0) guards against division by zero: if every amount in a group is identical the deviation is 0, and without the guard the query errors out. The same trick flags overpaid salaries within a department.

NULLs, types, and engine differences

STDDEV_* ignores NULL, just like AVG: nulls enter neither the sum nor the count. So the n in Bessel's correction is the number of NON-NULL values, not the total row count.
If no rows survive your filters, every one of these functions returns NULL, not an error.
Gotcha: confusing STDDEV_SAMP and STDDEV_POP on small groups. Across 5 rows, dividing by 4 instead of 5 is about a 12% shift in the spread estimate, and your outlier threshold drifts with it. Pick one version and do not mix them in a single report.
The result type is double precision, i.e. a float with rounding error. That is usually fine for money, but never test such a result for exact equality.
PostgreSQL offers all three names. In MySQL, STDDEV and STD are synonyms for STDDEV_POP (population!), and you must spell STDDEV_SAMP explicitly — the opposite of Postgres. In ClickHouse the functions are stddevSamp and stddevPop.

In short: pick SAMP for a sample and POP for a full population, always read the deviation next to AVG, guard the division with NULLIF, and remember that bare STDDEV means different things in Postgres and in MySQL.

SQL STDDEV: STDDEV_SAMP vs STDDEV_POP and Outlier Detection

Two functions: sample and population

mean +/- sd: pairing with AVG

Outlier detection via z-score

NULLs, types, and engine differences

Related articles

Practice on real tasks