An anti-join answers one question: "which rows in table A have no match in table B?" Users with no orders, orders with no payment, employees with no manager. SQL has no ANTI JOIN keyword, so you express it in one of three classic ways. All three return the "right" answer on clean data — but they diverge sharply once NULL enters the picture. Let's work them through on a single schema and figure out which one to reach for.
We'll use these tables (PostgreSQL):
CREATE TABLE users (
id bigint PRIMARY KEY,
email text NOT NULL
);
CREATE TABLE orders (
id bigint PRIMARY KEY,
user_id bigint,
amount numeric NOT NULL
);
Running task: find users who have placed no orders at all.
LEFT JOIN ... WHERE b.id IS NULL
The most visual approach. Do a LEFT JOIN that keeps every users row, then throw away the ones that found a match.
SELECT u.id, u.email
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;
How to read it: the LEFT JOIN fills every orders column with NULL when there's no match. The WHERE o.id IS NULL filter keeps exactly those non-matching rows. The key detail: test IS NULL on a column that is never NULL in a real row — the primary key o.id is perfect. If you test o.user_id (which is nullable here), you might accidentally catch guest orders and get garbage.
- Pro: intuitive, and it's easy to pull extra
orders columns in for debugging.
- Con: the "delete what matched" semantics aren't obvious to whoever reads the query later.
- Gotcha: multi-column matching turns the
IS NULL condition verbose and fragile.
NOT EXISTS — the recommended way
NOT EXISTS with a correlated subquery states the intent literally: "there is no order for this user."
SELECT u.id, u.email
FROM users u
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.user_id = u.id
);
Why this is usually the best choice:
- Reads like the requirement. "No such order exists." Done.
- NULL-safe.
EXISTS only checks whether rows exist; it doesn't compare values back to the outer query. A NULL in o.user_id simply fails o.user_id = u.id and counts as no match — no surprises.
- Performance. The PostgreSQL planner executes both
NOT EXISTS and LEFT JOIN / IS NULL as the same physical anti join (you'll see Hash Anti Join in EXPLAIN). In practice they're equivalent in speed; NOT EXISTS pulls ahead on composite keys.
A composite key stays clean — no scattered IS NULL checks:
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.region = u.region
)
NOT IN and the NULL trap
NOT IN looks compact and tempting — and it's the most treacherous of the three.
SELECT u.id, u.email
FROM users u
WHERE u.id NOT IN (SELECT user_id FROM orders);
If the subquery yields even a single NULL (and user_id is nullable here — guest orders!), the query returns zero rows. That's not a database bug; it's SQL's three-valued logic. x NOT IN (1, 2, NULL) expands to x <> 1 AND x <> 2 AND x <> NULL. That last comparison evaluates to UNKNOWN, so the whole expression can never become TRUE — the row is silently dropped.
- Gotcha: the failure is silent. The query doesn't error; it just hands back an empty (or truncated) result.
- If you're sure there are no
NULLs, add an explicit filter — but it's easy to forget when the data shifts:
WHERE u.id NOT IN (
SELECT user_id FROM orders WHERE user_id IS NOT NULL
);
Cross-engine notes:
- MySQL behaves identically — same three-valued logic, same
NULL blow-up.
- ClickHouse is also affected; on top of that its
JOIN has its own quirks, so NOT IN over a guaranteed non-NULL column (or LEFT ANTI JOIN) is often cleaner. Notably, ClickHouse is one of the few engines with an explicit LEFT ANTI JOIN syntax.
Which one to pick
The short rule:
- Default to
NOT EXISTS. Expressive, NULL-safe, and the optimizer turns it into an anti join. Make it your go-to.
LEFT JOIN ... IS NULL — when you also want columns from the right table, or it simply reads clearer to you. Always test IS NULL on a non-nullable key.
NOT IN — only when the subquery column is guaranteed NOT NULL (e.g. a primary key). Otherwise filter out NULL explicitly, or just don't.
Final checklist: can the subquery return NULL? If yes, no NOT IN. Need fields from the right side? Use LEFT JOIN. In every other case, write NOT EXISTS and forget three-valued logic exists.
An anti-join answers one question: "which rows in table A have no match in table B?" Users with no orders, orders with no payment, employees with no manager. SQL has no
ANTI JOINkeyword, so you express it in one of three classic ways. All three return the "right" answer on clean data — but they diverge sharply onceNULLenters the picture. Let's work them through on a single schema and figure out which one to reach for.We'll use these tables (PostgreSQL):
CREATE TABLE users ( id bigint PRIMARY KEY, email text NOT NULL ); CREATE TABLE orders ( id bigint PRIMARY KEY, user_id bigint, -- nullable: a "guest" order amount numeric NOT NULL );Running task: find users who have placed no orders at all.
LEFT JOIN ... WHERE b.id IS NULL
The most visual approach. Do a
LEFT JOINthat keeps everyusersrow, then throw away the ones that found a match.SELECT u.id, u.email FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE o.id IS NULL; -- no match => the whole right side is NULLHow to read it: the
LEFT JOINfills everyorderscolumn withNULLwhen there's no match. TheWHERE o.id IS NULLfilter keeps exactly those non-matching rows. The key detail: testIS NULLon a column that is neverNULLin a real row — the primary keyo.idis perfect. If you testo.user_id(which is nullable here), you might accidentally catch guest orders and get garbage.orderscolumns in for debugging.IS NULLcondition verbose and fragile.NOT EXISTS — the recommended way
NOT EXISTSwith a correlated subquery states the intent literally: "there is no order for this user."SELECT u.id, u.email FROM users u WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id );Why this is usually the best choice:
EXISTSonly checks whether rows exist; it doesn't compare values back to the outer query. ANULLino.user_idsimply failso.user_id = u.idand counts as no match — no surprises.NOT EXISTSandLEFT JOIN / IS NULLas the same physical anti join (you'll seeHash Anti JoininEXPLAIN). In practice they're equivalent in speed;NOT EXISTSpulls ahead on composite keys.A composite key stays clean — no scattered
IS NULLchecks:WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.region = u.region )NOT IN and the NULL trap
NOT INlooks compact and tempting — and it's the most treacherous of the three.-- DANGEROUS if orders.user_id contains NULL SELECT u.id, u.email FROM users u WHERE u.id NOT IN (SELECT user_id FROM orders);If the subquery yields even a single
NULL(anduser_idis nullable here — guest orders!), the query returns zero rows. That's not a database bug; it's SQL's three-valued logic.x NOT IN (1, 2, NULL)expands tox <> 1 AND x <> 2 AND x <> NULL. That last comparison evaluates toUNKNOWN, so the whole expression can never becomeTRUE— the row is silently dropped.NULLs, add an explicit filter — but it's easy to forget when the data shifts:WHERE u.id NOT IN ( SELECT user_id FROM orders WHERE user_id IS NOT NULL );Cross-engine notes:
NULLblow-up.JOINhas its own quirks, soNOT INover a guaranteed non-NULLcolumn (orLEFT ANTI JOIN) is often cleaner. Notably, ClickHouse is one of the few engines with an explicitLEFT ANTI JOINsyntax.Which one to pick
The short rule:
NOT EXISTS. Expressive, NULL-safe, and the optimizer turns it into an anti join. Make it your go-to.LEFT JOIN ... IS NULL— when you also want columns from the right table, or it simply reads clearer to you. Always testIS NULLon a non-nullable key.NOT IN— only when the subquery column is guaranteedNOT NULL(e.g. a primary key). Otherwise filter outNULLexplicitly, or just don't.Final checklist: can the subquery return
NULL? If yes, noNOT IN. Need fields from the right side? UseLEFT JOIN. In every other case, writeNOT EXISTSand forget three-valued logic exists.