sqlpostgresqlstatisticsaggregation

REGR_SLOPE and REGR_INTERCEPT in PostgreSQL: A Trend Line and Forecast in One Query

Build a least-squares trend line right in SQL with REGR_SLOPE and REGR_INTERCEPT, watch the (y, x) argument order, check REGR_COUNT, and forecast without an external stats package.

3 min readReferencesql · postgresql · statistics · aggregation · analytics · forecasting

REGR_SLOPE and REGR_INTERCEPT are aggregate functions that fit a least-squares straight line right inside a query. Together they give you the equation y = slope * x + intercept -- a trend line and a ready-made forecast formula, with no need to ship data off to Python or R.

Syntax and argument order

Both functions take two arguments, and this is where the main trap hides. The order is the dependent variable y first (what you predict), then the independent variable x (what you predict from).

SELECT
    REGR_SLOPE(amount, extract(epoch from created_at)) AS slope,
    REGR_INTERCEPT(amount, extract(epoch from created_at)) AS intercept,
    REGR_COUNT(amount, extract(epoch from created_at)) AS n
FROM orders
WHERE status = 'paid';

What each piece does:

  • REGR_SLOPE(y, x) -- the slope: how much y changes on average for a one-unit rise in x.
  • REGR_INTERCEPT(y, x) -- where the line crosses the y axis (its value at x = 0).
  • REGR_COUNT(y, x) -- the number of pairs where both values are non-NULL. This is your sample size.

Gotcha: the order is (y, x), not (x, y). Swap them and you fit the inverse relationship, so your forecast drifts. Remember it by meaning: "what we predict" comes first.

A daily revenue trend

Convert the timestamp into a number (epoch seconds) and ask whether the average order grows over time. A slope scaled to days reads better than one per second:

SELECT
    REGR_SLOPE(amount, extract(epoch from created_at)) * 86400 AS amount_per_day,
    REGR_COUNT(amount, extract(epoch from created_at)) AS sample_size
FROM orders
WHERE status = 'paid';

A positive amount_per_day means order amounts trend upward day over day; a negative one means they shrink. Keep REGR_COUNT next to it: a slope over three points is noise, not a trend.

Forecasting without an external package

Since you already have slope and intercept, a forecast is just plugging an x into the line equation. Compute the coefficients once in a CTE and predict revenue at a future date:

WITH model AS (
    SELECT
        REGR_SLOPE(amount, extract(epoch from created_at)) AS slope,
        REGR_INTERCEPT(amount, extract(epoch from created_at)) AS intercept
    FROM orders
    WHERE status = 'paid'
)
SELECT
    slope * extract(epoch from TIMESTAMP '2026-12-31') + intercept AS forecast_amount
FROM model;

The same trick is not limited to time. Take salary as a function of a manager's team size -- a linear fit per department:

SELECT
    dept,
    REGR_SLOPE(salary, reports) AS salary_per_report,
    REGR_INTERCEPT(salary, reports) AS base_salary
FROM (
    SELECT
        e.dept,
        e.salary,
        count(r.id) AS reports
    FROM employees e
    LEFT JOIN employees r ON r.manager_id = e.id
    GROUP BY e.id, e.dept, e.salary
) s
GROUP BY dept;

Fit quality and pitfalls

The slope alone says nothing about how well the line describes the data. For that you have REGR_R2 (the coefficient of determination, 0 to 1) and CORR (correlation):

SELECT
    REGR_R2(amount, extract(epoch from created_at)) AS r_squared,
    CORR(amount, extract(epoch from created_at)) AS correlation
FROM orders
WHERE status = 'paid';

Things to keep in mind:

  • An R2 near zero means the line is nearly useless; the forecast deserves no trust.
  • If every x is identical (zero variance), the slope is undefined and the function returns NULL rather than erroring.
  • Rows where y or x is NULL are skipped -- which is exactly why you cross-check REGR_COUNT.
  • Linear regression only sees a straight line. It will flatten seasonality and exponential growth and mislead you.

Differences across engines

  • MySQL has no REGR_* functions. Compute the slope by hand: slope = (n*SUM(x*y) - SUM(x)*SUM(y)) / (n*SUM(x*x) - SUM(x)*SUM(x)), and intercept = (SUM(y) - slope*SUM(x)) / n.
  • ClickHouse offers simpleLinearRegression(x, y), and here the argument order is reversed -- x comes first. It returns a tuple (slope, intercept).
  • BigQuery has no built-in REGR_SLOPE or REGR_INTERCEPT -- you compute the slope and intercept by hand with SUM, AVG, and COVAR_POP/VAR_POP, or train a model with CREATE MODEL ... OPTIONS(model_type='linear_reg') in BigQuery ML. Snowflake, by contrast, does support the standard REGR_SLOPE(y, x) and REGR_INTERCEPT(y, x), just like PostgreSQL.

When you want a quick trend line or a cheap in-database forecast, REGR_SLOPE and REGR_INTERCEPT save you a whole round trip to an external analytics stack -- just keep the argument order straight and check the sample size.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer