REGR_SLOPE and REGR_INTERCEPT are aggregate functions that fit a least-squares straight line right inside a query. Together they give you the equation y = slope * x + intercept -- a trend line and a ready-made forecast formula, with no need to ship data off to Python or R.
Syntax and argument order
Both functions take two arguments, and this is where the main trap hides. The order is the dependent variable y first (what you predict), then the independent variable x (what you predict from).
SELECT
REGR_SLOPE(amount, extract(epoch from created_at)) AS slope,
REGR_INTERCEPT(amount, extract(epoch from created_at)) AS intercept,
REGR_COUNT(amount, extract(epoch from created_at)) AS n
FROM orders
WHERE status = 'paid';
What each piece does:
REGR_SLOPE(y, x) -- the slope: how much y changes on average for a one-unit rise in x.
REGR_INTERCEPT(y, x) -- where the line crosses the y axis (its value at x = 0).
REGR_COUNT(y, x) -- the number of pairs where both values are non-NULL. This is your sample size.
Gotcha: the order is (y, x), not (x, y). Swap them and you fit the inverse relationship, so your forecast drifts. Remember it by meaning: "what we predict" comes first.
A daily revenue trend
Convert the timestamp into a number (epoch seconds) and ask whether the average order grows over time. A slope scaled to days reads better than one per second:
SELECT
REGR_SLOPE(amount, extract(epoch from created_at)) * 86400 AS amount_per_day,
REGR_COUNT(amount, extract(epoch from created_at)) AS sample_size
FROM orders
WHERE status = 'paid';
A positive amount_per_day means order amounts trend upward day over day; a negative one means they shrink. Keep REGR_COUNT next to it: a slope over three points is noise, not a trend.
Forecasting without an external package
Since you already have slope and intercept, a forecast is just plugging an x into the line equation. Compute the coefficients once in a CTE and predict revenue at a future date:
WITH model AS (
SELECT
REGR_SLOPE(amount, extract(epoch from created_at)) AS slope,
REGR_INTERCEPT(amount, extract(epoch from created_at)) AS intercept
FROM orders
WHERE status = 'paid'
)
SELECT
slope * extract(epoch from TIMESTAMP '2026-12-31') + intercept AS forecast_amount
FROM model;
The same trick is not limited to time. Take salary as a function of a manager's team size -- a linear fit per department:
SELECT
dept,
REGR_SLOPE(salary, reports) AS salary_per_report,
REGR_INTERCEPT(salary, reports) AS base_salary
FROM (
SELECT
e.dept,
e.salary,
count(r.id) AS reports
FROM employees e
LEFT JOIN employees r ON r.manager_id = e.id
GROUP BY e.id, e.dept, e.salary
) s
GROUP BY dept;
Fit quality and pitfalls
The slope alone says nothing about how well the line describes the data. For that you have REGR_R2 (the coefficient of determination, 0 to 1) and CORR (correlation):
SELECT
REGR_R2(amount, extract(epoch from created_at)) AS r_squared,
CORR(amount, extract(epoch from created_at)) AS correlation
FROM orders
WHERE status = 'paid';
Things to keep in mind:
- An
R2 near zero means the line is nearly useless; the forecast deserves no trust.
- If every
x is identical (zero variance), the slope is undefined and the function returns NULL rather than erroring.
- Rows where
y or x is NULL are skipped -- which is exactly why you cross-check REGR_COUNT.
- Linear regression only sees a straight line. It will flatten seasonality and exponential growth and mislead you.
Differences across engines
- MySQL has no
REGR_* functions. Compute the slope by hand: slope = (n*SUM(x*y) - SUM(x)*SUM(y)) / (n*SUM(x*x) - SUM(x)*SUM(x)), and intercept = (SUM(y) - slope*SUM(x)) / n.
- ClickHouse offers
simpleLinearRegression(x, y), and here the argument order is reversed -- x comes first. It returns a tuple (slope, intercept).
- BigQuery has no built-in
REGR_SLOPE or REGR_INTERCEPT -- you compute the slope and intercept by hand with SUM, AVG, and COVAR_POP/VAR_POP, or train a model with CREATE MODEL ... OPTIONS(model_type='linear_reg') in BigQuery ML. Snowflake, by contrast, does support the standard REGR_SLOPE(y, x) and REGR_INTERCEPT(y, x), just like PostgreSQL.
When you want a quick trend line or a cheap in-database forecast, REGR_SLOPE and REGR_INTERCEPT save you a whole round trip to an external analytics stack -- just keep the argument order straight and check the sample size.
REGR_SLOPEandREGR_INTERCEPTare aggregate functions that fit a least-squares straight line right inside a query. Together they give you the equationy = slope * x + intercept-- a trend line and a ready-made forecast formula, with no need to ship data off to Python or R.Syntax and argument order
Both functions take two arguments, and this is where the main trap hides. The order is the dependent variable
yfirst (what you predict), then the independent variablex(what you predict from).SELECT REGR_SLOPE(amount, extract(epoch from created_at)) AS slope, REGR_INTERCEPT(amount, extract(epoch from created_at)) AS intercept, REGR_COUNT(amount, extract(epoch from created_at)) AS n FROM orders WHERE status = 'paid';What each piece does:
REGR_SLOPE(y, x)-- the slope: how muchychanges on average for a one-unit rise inx.REGR_INTERCEPT(y, x)-- where the line crosses theyaxis (its value atx = 0).REGR_COUNT(y, x)-- the number of pairs where both values are non-NULL. This is your sample size.A daily revenue trend
Convert the timestamp into a number (epoch seconds) and ask whether the average order grows over time. A slope scaled to days reads better than one per second:
SELECT REGR_SLOPE(amount, extract(epoch from created_at)) * 86400 AS amount_per_day, REGR_COUNT(amount, extract(epoch from created_at)) AS sample_size FROM orders WHERE status = 'paid';A positive
amount_per_daymeans order amounts trend upward day over day; a negative one means they shrink. KeepREGR_COUNTnext to it: a slope over three points is noise, not a trend.Forecasting without an external package
Since you already have
slopeandintercept, a forecast is just plugging anxinto the line equation. Compute the coefficients once in a CTE and predict revenue at a future date:WITH model AS ( SELECT REGR_SLOPE(amount, extract(epoch from created_at)) AS slope, REGR_INTERCEPT(amount, extract(epoch from created_at)) AS intercept FROM orders WHERE status = 'paid' ) SELECT slope * extract(epoch from TIMESTAMP '2026-12-31') + intercept AS forecast_amount FROM model;The same trick is not limited to time. Take salary as a function of a manager's team size -- a linear fit per department:
SELECT dept, REGR_SLOPE(salary, reports) AS salary_per_report, REGR_INTERCEPT(salary, reports) AS base_salary FROM ( SELECT e.dept, e.salary, count(r.id) AS reports FROM employees e LEFT JOIN employees r ON r.manager_id = e.id GROUP BY e.id, e.dept, e.salary ) s GROUP BY dept;Fit quality and pitfalls
The slope alone says nothing about how well the line describes the data. For that you have
REGR_R2(the coefficient of determination, 0 to 1) andCORR(correlation):SELECT REGR_R2(amount, extract(epoch from created_at)) AS r_squared, CORR(amount, extract(epoch from created_at)) AS correlation FROM orders WHERE status = 'paid';Things to keep in mind:
R2near zero means the line is nearly useless; the forecast deserves no trust.xis identical (zero variance), the slope is undefined and the function returnsNULLrather than erroring.yorxisNULLare skipped -- which is exactly why you cross-checkREGR_COUNT.Differences across engines
REGR_*functions. Compute the slope by hand:slope = (n*SUM(x*y) - SUM(x)*SUM(y)) / (n*SUM(x*x) - SUM(x)*SUM(x)), andintercept = (SUM(y) - slope*SUM(x)) / n.simpleLinearRegression(x, y), and here the argument order is reversed --xcomes first. It returns a tuple(slope, intercept).REGR_SLOPEorREGR_INTERCEPT-- you compute the slope and intercept by hand withSUM,AVG, andCOVAR_POP/VAR_POP, or train a model withCREATE MODEL ... OPTIONS(model_type='linear_reg')in BigQuery ML. Snowflake, by contrast, does support the standardREGR_SLOPE(y, x)andREGR_INTERCEPT(y, x), just like PostgreSQL.When you want a quick trend line or a cheap in-database forecast,
REGR_SLOPEandREGR_INTERCEPTsave you a whole round trip to an external analytics stack -- just keep the argument order straight and check the sample size.