You can find rows with no match in another table three ways: NOT IN, NOT EXISTS, and LEFT JOIN ... IS NULL. They look interchangeable, but NOT IN hides a nasty NULL trap that quietly wipes out your result one day. Let's compare them over a users, orders, employees schema.
The task: users with no orders
The classic anti-join: who never placed an order. The intuitive NOT IN version:
SELECT u.id, u.email
FROM users u
WHERE u.id NOT IN (SELECT user_id FROM orders);
As long as orders.user_id has no NULL, this is fine. But user_id may be nullable (guest checkout, soft deletes), and then the query breaks -- no error, no warning.
The NULL trap in NOT IN
NOT IN (...) expands to a chain id <> v1 AND id <> v2 AND .... If any value is NULL, the comparison id <> NULL is neither TRUE nor FALSE but UNKNOWN. In three-valued logic TRUE AND UNKNOWN = UNKNOWN, so the row is dropped. The result: a single NULL in the subquery collapses the whole result to empty.
SELECT u.id, u.email
FROM users u
WHERE u.id NOT IN (SELECT user_id FROM orders);
- This is standard SQL behavior, not a quirk of one engine.
NULL IN (...) and NULL NOT IN (...) never evaluate to TRUE either.
- PostgreSQL, MySQL, and ClickHouse all behave the same way.
The band-aid is to filter NULL out: WHERE u.id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL). It works, but it is easy to forget -- and even easier to forget on a column inside a composite key.
NOT EXISTS: safe by default
NOT EXISTS checks whether at least one row exists and uses plain two-valued existence logic, so NULL cannot derail it:
SELECT u.id, u.email
FROM users u
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.user_id = u.id
);
A row with o.user_id IS NULL simply fails the o.user_id = u.id predicate and is quietly skipped -- no collapse. The SELECT 1 inside is an idiom: the select list in EXISTS is never evaluated, only the presence of a row matters.
Correlated subqueries also compose nicely with multi-column conditions:
SELECT e.id, e.name, e.dept
FROM employees e
WHERE NOT EXISTS (
SELECT 1
FROM employees s
WHERE s.manager_id = e.id
);
LEFT JOIN ... IS NULL
The third option is an outer join filtered on the non-match. It is also NULL-safe as an anti-join condition:
SELECT u.id, u.email
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.user_id IS NULL;
- Test
IS NULL on the join key itself (o.user_id), not on some arbitrary nullable column of orders.
- In a one-to-many relationship the join multiplies rows, so you often need
DISTINCT or GROUP BY afterwards -- extra work. NOT EXISTS avoids that entirely.
Modern PostgreSQL can run both NOT EXISTS and LEFT JOIN ... IS NULL as a real anti join (Hash Anti Join / Merge Anti Join in EXPLAIN), stopping at the first match. With a nullable column, the planner is not allowed to turn NOT IN into an anti-join because of NULL semantics, and you often get a heavier plan.
- On PostgreSQL prefer
NOT EXISTS for both NULL-safety and speed.
- MySQL 8 also optimizes
NOT EXISTS well; historically NOT IN could be markedly slower.
- ClickHouse offers a clean
LEFT ANTI JOIN -- an explicit anti-join with no subquery.
- Back the correlated
NOT EXISTS predicate with an index (for example on orders(user_id)).
Bottom line: make NOT EXISTS your default for "rows with no match." Keep NOT IN for static, NULL-free lists (status NOT IN ('paid','shipped')), and reach for LEFT JOIN ... IS NULL when you also need columns from the unmatched side.
You can find rows with no match in another table three ways:
NOT IN,NOT EXISTS, andLEFT JOIN ... IS NULL. They look interchangeable, butNOT INhides a nastyNULLtrap that quietly wipes out your result one day. Let's compare them over ausers,orders,employeesschema.The task: users with no orders
The classic anti-join: who never placed an order. The intuitive
NOT INversion:SELECT u.id, u.email FROM users u WHERE u.id NOT IN (SELECT user_id FROM orders);As long as
orders.user_idhas noNULL, this is fine. Butuser_idmay be nullable (guest checkout, soft deletes), and then the query breaks -- no error, no warning.The NULL trap in NOT IN
NOT IN (...)expands to a chainid <> v1 AND id <> v2 AND .... If any value isNULL, the comparisonid <> NULLis neitherTRUEnorFALSEbutUNKNOWN. In three-valued logicTRUE AND UNKNOWN = UNKNOWN, so the row is dropped. The result: a singleNULLin the subquery collapses the whole result to empty.-- orders contains one row with user_id IS NULL. -- This returns NOTHING, even though some users have no orders: SELECT u.id, u.email FROM users u WHERE u.id NOT IN (SELECT user_id FROM orders);NULL IN (...)andNULL NOT IN (...)never evaluate toTRUEeither.The band-aid is to filter
NULLout:WHERE u.id NOT IN (SELECT user_id FROM orders WHERE user_id IS NOT NULL). It works, but it is easy to forget -- and even easier to forget on a column inside a composite key.NOT EXISTS: safe by default
NOT EXISTSchecks whether at least one row exists and uses plain two-valued existence logic, soNULLcannot derail it:SELECT u.id, u.email FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id );A row with
o.user_id IS NULLsimply fails theo.user_id = u.idpredicate and is quietly skipped -- no collapse. TheSELECT 1inside is an idiom: the select list inEXISTSis never evaluated, only the presence of a row matters.Correlated subqueries also compose nicely with multi-column conditions:
-- Employees with no reports (nobody lists them as manager_id): SELECT e.id, e.name, e.dept FROM employees e WHERE NOT EXISTS ( SELECT 1 FROM employees s WHERE s.manager_id = e.id );LEFT JOIN ... IS NULL
The third option is an outer join filtered on the non-match. It is also NULL-safe as an anti-join condition:
SELECT u.id, u.email FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE o.user_id IS NULL;IS NULLon the join key itself (o.user_id), not on some arbitrary nullable column oforders.DISTINCTorGROUP BYafterwards -- extra work.NOT EXISTSavoids that entirely.Performance and anti-join plans
Modern PostgreSQL can run both
NOT EXISTSandLEFT JOIN ... IS NULLas a real anti join (Hash Anti Join/Merge Anti JoininEXPLAIN), stopping at the first match. With a nullable column, the planner is not allowed to turnNOT INinto an anti-join because ofNULLsemantics, and you often get a heavier plan.NOT EXISTSfor both NULL-safety and speed.NOT EXISTSwell; historicallyNOT INcould be markedly slower.LEFT ANTI JOIN-- an explicit anti-join with no subquery.NOT EXISTSpredicate with an index (for example onorders(user_id)).Bottom line: make
NOT EXISTSyour default for "rows with no match." KeepNOT INfor static, NULL-free lists (status NOT IN ('paid','shipped')), and reach forLEFT JOIN ... IS NULLwhen you also need columns from the unmatched side.