UNION vs UNION ALL in SQL: Combining Query Results

UNION and UNION ALL stack the results of two or more SELECT statements vertically: one query's rows are appended below another's. The difference is a single keyword, but it drives both correctness and performance.

UNION vs UNION ALL

UNION ALL simply concatenates the row sets and checks nothing for repeats. UNION additionally removes duplicates, and to find them the engine has to sort or hash the entire result.

-- Keeps every row, including duplicates. Fast.
SELECT country FROM users
UNION ALL
SELECT country FROM users;

-- Collapses identical rows into one. Extra dedup pass.
SELECT country FROM users
UNION
SELECT country FROM users;

Rule of thumb: if you already know the inputs don't overlap (say, you're stacking orders from different months or rows from separate shards), reach for UNION ALL. The extra dedup over millions of rows costs real CPU seconds and memory for the sort, sometimes spilling to disk. Reach for UNION only when repeats are genuinely possible and they bother you.

UNION ALL — no dedup, row order isn't guaranteed, but it's usually cheaper and steadier on memory.
UNION — dedup across the whole column set, not just the first column, so the cost grows with row width.
A duplicate is a row that matches entirely; here NULL equals NULL (unlike a normal = comparison).
Parentheses let you combine branches: (a UNION ALL b) UNION c first concatenates the two parts, then dedups everything together.

Column compatibility rules

You can only combine queries with the same number of columns and compatible types. Column names come from the first SELECT; the rest are ignored.

-- OK: same arity, compatible types
SELECT id, email FROM users
UNION ALL
SELECT user_id, status FROM orders;

-- Fails: 2 columns vs 3 columns
SELECT id, email FROM users
UNION ALL
SELECT id, user_id, amount FROM orders;

If the types differ, line them up explicitly with CAST rather than relying on implicit coercion. To make a heterogeneous stack meaningful, add a source column:

SELECT id, name, CAST(NULL AS numeric) AS amount, 'user' AS source
FROM users
UNION ALL
SELECT id, CAST(NULL AS text), amount, 'order'
FROM orders;

ORDER BY and LIMIT over the union

ORDER BY applies to the whole result and is written once, at the very end. Sorting individual branches is pointless: UNION may reshuffle the rows anyway.

SELECT id, name, salary FROM employees WHERE dept = 'eng'
UNION ALL
SELECT id, name, salary FROM employees WHERE dept = 'sales'
ORDER BY salary DESC
LIMIT 10;

If you need to sort or limit a specific branch, wrap it in parentheses and a subquery:

SELECT * FROM (
  SELECT id, amount FROM orders WHERE status = 'paid'
  ORDER BY amount DESC LIMIT 5
) AS top_paid
UNION ALL
SELECT id, amount FROM orders WHERE status = 'refunded';

Gotcha: the final ORDER BY sorts by column position or by the name from the first SELECT. If branches name columns differently, refer to the first branch's name or to the position number (ORDER BY 3).

Engine differences

PostgreSQL — strict type checking; UNION uses a Sort or Hash Aggregate, visible in EXPLAIN.
MySQL — allows more implicit coercion and may silently truncate values; put a branch's ORDER BY/LIMIT in a subquery or it gets ignored.
ClickHouse — by default UNION without ALL may require explicit configuration; here you almost always write UNION ALL and dedup via DISTINCT or aggregates.

Bottom line: UNION ALL is the default choice for speed, and you reach for UNION deliberately, only when repeats genuinely need to collapse.

UNION vs UNION ALL in SQL: Combining Query Results