sqlpostgresqlvariancestatistics

SQL VARIANCE: VAR_SAMP vs VAR_POP and the Link to STDDEV

How VAR_SAMP and VAR_POP work in PostgreSQL, why bare VARIANCE equals VAR_SAMP, and how variance ties to standard deviation.

3 min readReferencesql · postgresql · variance · statistics · aggregate

Variance measures how far values spread around their mean: the larger it is, the "wider" the data. It is the average squared deviation from the mean, expressed in the squared units of the input. SQL ships two flavors, sample variance VAR_SAMP and population variance VAR_POP, and the distinction matters more than it first appears.

Two functions, one shorthand

PostgreSQL exposes three names, but they hide just two formulas:

  • VAR_POP(x) is the population variance: it divides the sum of squared deviations by N.
  • VAR_SAMP(x) is the sample variance: it divides by N - 1 (Bessel's correction).
  • VARIANCE(x) is simply an alias for VAR_SAMP.
SELECT
  var_samp(amount) AS sample_variance,
  var_pop(amount)  AS pop_variance,
  variance(amount) AS bare_variance   -- equals var_samp
FROM orders;

Key takeaway: in PostgreSQL, bare VARIANCE is NOT a "neutral" choice, it divides by N - 1. ClickHouse spells the explicit functions varSamp and varPop. MySQL is the trap: there VARIANCE is a synonym for VAR_POP (division by N), not VAR_SAMP. So a bare VARIANCE query moved between PostgreSQL and MySQL returns different numbers for the same data. For portable code, always write the form you mean explicitly.

Sample or population

Which one you want depends on what your rows represent.

  • If the rows are ALL the objects you care about (say, every order on a specific day), use VAR_POP.
  • If the rows are a sample drawn from a larger, unobserved set (orders as a sample of overall customer behavior), use VAR_SAMP.
-- Spread of order amounts per status, treating each group as a sample
SELECT
  status,
  count(*)              AS n,
  round(var_samp(amount)::numeric, 2) AS variance
FROM orders
GROUP BY status
ORDER BY variance DESC NULLS LAST;

For large N the gap between dividing by N and N - 1 vanishes. On small groups it is visible: at N = 2, the sample variance is exactly twice the population variance. So on sparse data or with many tiny groups, the choice of function directly moves the numbers in your report, and it is worth making explicit rather than leaning on a default. When you are unsure whether your rows are a full population or a sample, treating them as a sample and using VAR_SAMP is almost always the safer call: it gives an unbiased estimate of the spread, whereas VAR_POP on a sample systematically understates it.

Variance is standard deviation squared

Variance and standard deviation are two views of the same number: STDDEV is the square root of VARIANCE. STDDEV_SAMP pairs with VAR_SAMP and STDDEV_POP with VAR_POP. That identity is handy for sanity checks and for moving between units.

SELECT
  var_samp(salary)             AS variance_salary,
  stddev_samp(salary)          AS stddev_salary,
  sqrt(var_samp(salary))       AS stddev_via_sqrt  -- matches stddev_samp
FROM employees;

Standard deviation is in the same units as the data (dollars, euros), while variance is in those units squared. So reports almost always show STDDEV to humans and keep variance inside the math, where the square is convenient (for example, when adding the variances of independent variables).

Numerical details and gotchas

  • NULLs are silently ignored: VAR_SAMP counts only non-null values, and N is that count, not the total row count.
  • On a single row, VAR_SAMP returns NULL (you cannot divide by N - 1 = 0), whereas VAR_POP returns 0. This is the classic gotcha when grouping: singleton groups suddenly produce NULL.
-- Single-row group: var_samp is NULL, var_pop is 0
SELECT
  dept,
  count(*)        AS n,
  var_samp(salary) AS vs,   -- NULL when n = 1
  var_pop(salary)  AS vp    -- 0 when n = 1
FROM employees
GROUP BY dept;
  • The result type is double precision for float inputs and numeric for integer and numeric inputs. On very large numbers the float path can lose precision; cast to numeric when you need stability.
  • Do not compute variance over already-aggregated averages, it gives a wrong answer. Variance is taken over the raw, unaggregated values.
-- Cast to numeric for stable variance on large monetary values
SELECT var_samp(amount::numeric) AS stable_variance
FROM orders
WHERE status = 'paid';

Short rule: in PostgreSQL use VAR_SAMP (or VARIANCE) for samples, VAR_POP for a full population, watch for NULL on single-row groups, and reach for STDDEV via the square root whenever you want human-readable units. And remember that VARIANCE means VAR_POP in MySQL.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer