Standard deviation measures how spread out values are around the mean: small means the data clusters near AVG, large means it scatters. In SQL this is not one function but a pair: STDDEV_SAMP for a sample and STDDEV_POP for a whole population, and the difference matters more than it looks.
Two functions: sample and population
Both measure spread, but they divide the sum of squared deviations by a different count. STDDEV_POP divides by n (the full population), STDDEV_SAMP divides by n - 1 (a sample, Bessel's correction). Bare STDDEV in PostgreSQL is a synonym for STDDEV_SAMP.
SELECT STDDEV_SAMP(amount) AS sd_sample,
STDDEV_POP(amount) AS sd_population,
STDDEV(amount) AS sd_bare
FROM orders;
On small sets, dividing by n - 1 noticeably inflates the result versus n; on large sets the gap shrinks toward zero.
- Use
STDDEV_SAMP (or just STDDEV) when the rows are a sample of something bigger: one month of orders as a stand-in for customer behavior.
- Use
STDDEV_POP when the rows are the entire thing you study: the salary spread across every current employee.
mean +/- sd: pairing with AVG
A deviation on its own is useless; you read it next to the mean. The classic report shows AVG plus a "mean plus-or-minus one deviation" band.
SELECT u.country,
COUNT(*) AS n,
AVG(o.amount) AS mean,
STDDEV_SAMP(o.amount) AS sd,
AVG(o.amount) - STDDEV_SAMP(o.amount) AS low,
AVG(o.amount) + STDDEV_SAMP(o.amount) AS high
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'paid'
GROUP BY u.country
HAVING COUNT(*) >= 2;
HAVING COUNT(*) >= 2 is not cosmetic: on a single row STDDEV_SAMP returns NULL, because the n - 1 divisor collapses to zero. STDDEV_POP on a single row returns 0.
The coefficient of variation (sd / mean) is a dimensionless spread measure that lets you compare groups whose amounts live on different scales.
SELECT dept,
AVG(salary) AS mean,
STDDEV_POP(salary) / NULLIF(AVG(salary), 0) AS cv
FROM employees
GROUP BY dept
ORDER BY cv DESC;
Outlier detection via z-score
A z-score is "how many deviations a value sits from the mean": (x - mean) / sd. Rows with |z| > 3 are usually treated as anomalies. It is convenient to compute the statistics with a window function and compare each row against them in one pass.
SELECT *
FROM (
SELECT o.id,
o.amount,
u.country,
(o.amount - AVG(o.amount) OVER w)
/ NULLIF(STDDEV_SAMP(o.amount) OVER w, 0) AS z
FROM orders o
JOIN users u ON u.id = o.user_id
WINDOW w AS (PARTITION BY u.country)
) t
WHERE ABS(z) > 3;
NULLIF(..., 0) guards against division by zero: if every amount in a group is identical the deviation is 0, and without the guard the query errors out. The same trick flags overpaid salaries within a department.
NULLs, types, and engine differences
STDDEV_* ignores NULL, just like AVG: nulls enter neither the sum nor the count. So the n in Bessel's correction is the number of NON-NULL values, not the total row count.
- If no rows survive your filters, every one of these functions returns
NULL, not an error.
- Gotcha: confusing
STDDEV_SAMP and STDDEV_POP on small groups. Across 5 rows, dividing by 4 instead of 5 is about a 12% shift in the spread estimate, and your outlier threshold drifts with it. Pick one version and do not mix them in a single report.
- The result type is
double precision, i.e. a float with rounding error. That is usually fine for money, but never test such a result for exact equality.
- PostgreSQL offers all three names. In MySQL,
STDDEV and STD are synonyms for STDDEV_POP (population!), and you must spell STDDEV_SAMP explicitly — the opposite of Postgres. In ClickHouse the functions are stddevSamp and stddevPop.
In short: pick SAMP for a sample and POP for a full population, always read the deviation next to AVG, guard the division with NULLIF, and remember that bare STDDEV means different things in Postgres and in MySQL.
Standard deviation measures how spread out values are around the mean: small means the data clusters near
AVG, large means it scatters. In SQL this is not one function but a pair:STDDEV_SAMPfor a sample andSTDDEV_POPfor a whole population, and the difference matters more than it looks.Two functions: sample and population
Both measure spread, but they divide the sum of squared deviations by a different count.
STDDEV_POPdivides byn(the full population),STDDEV_SAMPdivides byn - 1(a sample, Bessel's correction). BareSTDDEVin PostgreSQL is a synonym forSTDDEV_SAMP.-- sample vs population spread of order amounts SELECT STDDEV_SAMP(amount) AS sd_sample, STDDEV_POP(amount) AS sd_population, STDDEV(amount) AS sd_bare -- equals STDDEV_SAMP FROM orders;On small sets, dividing by
n - 1noticeably inflates the result versusn; on large sets the gap shrinks toward zero.STDDEV_SAMP(or justSTDDEV) when the rows are a sample of something bigger: one month of orders as a stand-in for customer behavior.STDDEV_POPwhen the rows are the entire thing you study: the salary spread across every current employee.mean +/- sd: pairing with AVG
A deviation on its own is useless; you read it next to the mean. The classic report shows
AVGplus a "mean plus-or-minus one deviation" band.-- mean and a one-sigma band per country SELECT u.country, COUNT(*) AS n, AVG(o.amount) AS mean, STDDEV_SAMP(o.amount) AS sd, AVG(o.amount) - STDDEV_SAMP(o.amount) AS low, AVG(o.amount) + STDDEV_SAMP(o.amount) AS high FROM orders o JOIN users u ON u.id = o.user_id WHERE o.status = 'paid' GROUP BY u.country HAVING COUNT(*) >= 2;HAVING COUNT(*) >= 2is not cosmetic: on a single rowSTDDEV_SAMPreturnsNULL, because then - 1divisor collapses to zero.STDDEV_POPon a single row returns0.The coefficient of variation (
sd / mean) is a dimensionless spread measure that lets you compare groups whose amounts live on different scales.SELECT dept, AVG(salary) AS mean, STDDEV_POP(salary) / NULLIF(AVG(salary), 0) AS cv FROM employees GROUP BY dept ORDER BY cv DESC;Outlier detection via z-score
A z-score is "how many deviations a value sits from the mean":
(x - mean) / sd. Rows with|z| > 3are usually treated as anomalies. It is convenient to compute the statistics with a window function and compare each row against them in one pass.-- flag order amounts more than 3 sd from the per-country mean SELECT * FROM ( SELECT o.id, o.amount, u.country, (o.amount - AVG(o.amount) OVER w) / NULLIF(STDDEV_SAMP(o.amount) OVER w, 0) AS z FROM orders o JOIN users u ON u.id = o.user_id WINDOW w AS (PARTITION BY u.country) ) t WHERE ABS(z) > 3;NULLIF(..., 0)guards against division by zero: if every amount in a group is identical the deviation is0, and without the guard the query errors out. The same trick flags overpaid salaries within a department.NULLs, types, and engine differences
STDDEV_*ignoresNULL, just likeAVG: nulls enter neither the sum nor the count. So thenin Bessel's correction is the number of NON-NULLvalues, not the total row count.NULL, not an error.STDDEV_SAMPandSTDDEV_POPon small groups. Across 5 rows, dividing by 4 instead of 5 is about a 12% shift in the spread estimate, and your outlier threshold drifts with it. Pick one version and do not mix them in a single report.double precision, i.e. afloatwith rounding error. That is usually fine for money, but never test such a result for exact equality.STDDEVandSTDare synonyms forSTDDEV_POP(population!), and you must spellSTDDEV_SAMPexplicitly — the opposite of Postgres. In ClickHouse the functions arestddevSampandstddevPop.In short: pick
SAMPfor a sample andPOPfor a full population, always read the deviation next toAVG, guard the division withNULLIF, and remember that bareSTDDEVmeans different things in Postgres and in MySQL.