Variance measures how far values spread around their mean: the larger it is, the "wider" the data. It is the average squared deviation from the mean, expressed in the squared units of the input. SQL ships two flavors, sample variance VAR_SAMP and population variance VAR_POP, and the distinction matters more than it first appears.
Two functions, one shorthand
PostgreSQL exposes three names, but they hide just two formulas:
VAR_POP(x) is the population variance: it divides the sum of squared deviations by N.
VAR_SAMP(x) is the sample variance: it divides by N - 1 (Bessel's correction).
VARIANCE(x) is simply an alias for VAR_SAMP.
SELECT
var_samp(amount) AS sample_variance,
var_pop(amount) AS pop_variance,
variance(amount) AS bare_variance
FROM orders;
Key takeaway: in PostgreSQL, bare VARIANCE is NOT a "neutral" choice, it divides by N - 1. ClickHouse spells the explicit functions varSamp and varPop. MySQL is the trap: there VARIANCE is a synonym for VAR_POP (division by N), not VAR_SAMP. So a bare VARIANCE query moved between PostgreSQL and MySQL returns different numbers for the same data. For portable code, always write the form you mean explicitly.
Sample or population
Which one you want depends on what your rows represent.
- If the rows are ALL the objects you care about (say, every order on a specific day), use
VAR_POP.
- If the rows are a sample drawn from a larger, unobserved set (orders as a sample of overall customer behavior), use
VAR_SAMP.
SELECT
status,
count(*) AS n,
round(var_samp(amount)::numeric, 2) AS variance
FROM orders
GROUP BY status
ORDER BY variance DESC NULLS LAST;
For large N the gap between dividing by N and N - 1 vanishes. On small groups it is visible: at N = 2, the sample variance is exactly twice the population variance. So on sparse data or with many tiny groups, the choice of function directly moves the numbers in your report, and it is worth making explicit rather than leaning on a default. When you are unsure whether your rows are a full population or a sample, treating them as a sample and using VAR_SAMP is almost always the safer call: it gives an unbiased estimate of the spread, whereas VAR_POP on a sample systematically understates it.
Variance is standard deviation squared
Variance and standard deviation are two views of the same number: STDDEV is the square root of VARIANCE. STDDEV_SAMP pairs with VAR_SAMP and STDDEV_POP with VAR_POP. That identity is handy for sanity checks and for moving between units.
SELECT
var_samp(salary) AS variance_salary,
stddev_samp(salary) AS stddev_salary,
sqrt(var_samp(salary)) AS stddev_via_sqrt
FROM employees;
Standard deviation is in the same units as the data (dollars, euros), while variance is in those units squared. So reports almost always show STDDEV to humans and keep variance inside the math, where the square is convenient (for example, when adding the variances of independent variables).
Numerical details and gotchas
NULLs are silently ignored: VAR_SAMP counts only non-null values, and N is that count, not the total row count.
- On a single row,
VAR_SAMP returns NULL (you cannot divide by N - 1 = 0), whereas VAR_POP returns 0. This is the classic gotcha when grouping: singleton groups suddenly produce NULL.
SELECT
dept,
count(*) AS n,
var_samp(salary) AS vs,
var_pop(salary) AS vp
FROM employees
GROUP BY dept;
- The result type is
double precision for float inputs and numeric for integer and numeric inputs. On very large numbers the float path can lose precision; cast to numeric when you need stability.
- Do not compute variance over already-aggregated averages, it gives a wrong answer. Variance is taken over the raw, unaggregated values.
SELECT var_samp(amount::numeric) AS stable_variance
FROM orders
WHERE status = 'paid';
Short rule: in PostgreSQL use VAR_SAMP (or VARIANCE) for samples, VAR_POP for a full population, watch for NULL on single-row groups, and reach for STDDEV via the square root whenever you want human-readable units. And remember that VARIANCE means VAR_POP in MySQL.
Variance measures how far values spread around their mean: the larger it is, the "wider" the data. It is the average squared deviation from the mean, expressed in the squared units of the input. SQL ships two flavors, sample variance
VAR_SAMPand population varianceVAR_POP, and the distinction matters more than it first appears.Two functions, one shorthand
PostgreSQL exposes three names, but they hide just two formulas:
VAR_POP(x)is the population variance: it divides the sum of squared deviations byN.VAR_SAMP(x)is the sample variance: it divides byN - 1(Bessel's correction).VARIANCE(x)is simply an alias forVAR_SAMP.SELECT var_samp(amount) AS sample_variance, var_pop(amount) AS pop_variance, variance(amount) AS bare_variance -- equals var_samp FROM orders;Key takeaway: in PostgreSQL, bare
VARIANCEis NOT a "neutral" choice, it divides byN - 1. ClickHouse spells the explicit functionsvarSampandvarPop. MySQL is the trap: thereVARIANCEis a synonym forVAR_POP(division byN), notVAR_SAMP. So a bareVARIANCEquery moved between PostgreSQL and MySQL returns different numbers for the same data. For portable code, always write the form you mean explicitly.Sample or population
Which one you want depends on what your rows represent.
VAR_POP.VAR_SAMP.-- Spread of order amounts per status, treating each group as a sample SELECT status, count(*) AS n, round(var_samp(amount)::numeric, 2) AS variance FROM orders GROUP BY status ORDER BY variance DESC NULLS LAST;For large
Nthe gap between dividing byNandN - 1vanishes. On small groups it is visible: atN = 2, the sample variance is exactly twice the population variance. So on sparse data or with many tiny groups, the choice of function directly moves the numbers in your report, and it is worth making explicit rather than leaning on a default. When you are unsure whether your rows are a full population or a sample, treating them as a sample and usingVAR_SAMPis almost always the safer call: it gives an unbiased estimate of the spread, whereasVAR_POPon a sample systematically understates it.Variance is standard deviation squared
Variance and standard deviation are two views of the same number:
STDDEVis the square root ofVARIANCE.STDDEV_SAMPpairs withVAR_SAMPandSTDDEV_POPwithVAR_POP. That identity is handy for sanity checks and for moving between units.SELECT var_samp(salary) AS variance_salary, stddev_samp(salary) AS stddev_salary, sqrt(var_samp(salary)) AS stddev_via_sqrt -- matches stddev_samp FROM employees;Standard deviation is in the same units as the data (dollars, euros), while variance is in those units squared. So reports almost always show
STDDEVto humans and keep variance inside the math, where the square is convenient (for example, when adding the variances of independent variables).Numerical details and gotchas
NULLs are silently ignored:VAR_SAMPcounts only non-null values, andNis that count, not the total row count.VAR_SAMPreturnsNULL(you cannot divide byN - 1 = 0), whereasVAR_POPreturns0. This is the classic gotcha when grouping: singleton groups suddenly produceNULL.-- Single-row group: var_samp is NULL, var_pop is 0 SELECT dept, count(*) AS n, var_samp(salary) AS vs, -- NULL when n = 1 var_pop(salary) AS vp -- 0 when n = 1 FROM employees GROUP BY dept;double precisionfor float inputs andnumericfor integer andnumericinputs. On very large numbers the float path can lose precision; cast tonumericwhen you need stability.-- Cast to numeric for stable variance on large monetary values SELECT var_samp(amount::numeric) AS stable_variance FROM orders WHERE status = 'paid';Short rule: in PostgreSQL use
VAR_SAMP(orVARIANCE) for samples,VAR_POPfor a full population, watch forNULLon single-row groups, and reach forSTDDEVvia the square root whenever you want human-readable units. And remember thatVARIANCEmeansVAR_POPin MySQL.