sqlpostgresqljoinsnull

SQL Anti-Joins: Finding Rows With No Match

Three ways to find rows with no match — LEFT JOIN / IS NULL, NOT EXISTS and NOT IN — and why NOT IN breaks on NULL.

3 min readReferencesql · postgresql · joins · null · anti-join

An anti-join answers one question: "which rows in table A have no match in table B?" Users with no orders, orders with no payment, employees with no manager. SQL has no ANTI JOIN keyword, so you express it in one of three classic ways. All three return the "right" answer on clean data — but they diverge sharply once NULL enters the picture. Let's work them through on a single schema and figure out which one to reach for.

We'll use these tables (PostgreSQL):

CREATE TABLE users (
  id    bigint PRIMARY KEY,
  email text NOT NULL
);

CREATE TABLE orders (
  id       bigint PRIMARY KEY,
  user_id  bigint,          -- nullable: a "guest" order
  amount   numeric NOT NULL
);

Running task: find users who have placed no orders at all.

LEFT JOIN ... WHERE b.id IS NULL

The most visual approach. Do a LEFT JOIN that keeps every users row, then throw away the ones that found a match.

SELECT u.id, u.email
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;   -- no match => the whole right side is NULL

How to read it: the LEFT JOIN fills every orders column with NULL when there's no match. The WHERE o.id IS NULL filter keeps exactly those non-matching rows. The key detail: test IS NULL on a column that is never NULL in a real row — the primary key o.id is perfect. If you test o.user_id (which is nullable here), you might accidentally catch guest orders and get garbage.

  • Pro: intuitive, and it's easy to pull extra orders columns in for debugging.
  • Con: the "delete what matched" semantics aren't obvious to whoever reads the query later.
  • Gotcha: multi-column matching turns the IS NULL condition verbose and fragile.

NOT EXISTS with a correlated subquery states the intent literally: "there is no order for this user."

SELECT u.id, u.email
FROM users u
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.user_id = u.id
);

Why this is usually the best choice:

  • Reads like the requirement. "No such order exists." Done.
  • NULL-safe. EXISTS only checks whether rows exist; it doesn't compare values back to the outer query. A NULL in o.user_id simply fails o.user_id = u.id and counts as no match — no surprises.
  • Performance. The PostgreSQL planner executes both NOT EXISTS and LEFT JOIN / IS NULL as the same physical anti join (you'll see Hash Anti Join in EXPLAIN). In practice they're equivalent in speed; NOT EXISTS pulls ahead on composite keys.

A composite key stays clean — no scattered IS NULL checks:

WHERE NOT EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id AND o.region = u.region
)

NOT IN and the NULL trap

NOT IN looks compact and tempting — and it's the most treacherous of the three.

-- DANGEROUS if orders.user_id contains NULL
SELECT u.id, u.email
FROM users u
WHERE u.id NOT IN (SELECT user_id FROM orders);

If the subquery yields even a single NULL (and user_id is nullable here — guest orders!), the query returns zero rows. That's not a database bug; it's SQL's three-valued logic. x NOT IN (1, 2, NULL) expands to x <> 1 AND x <> 2 AND x <> NULL. That last comparison evaluates to UNKNOWN, so the whole expression can never become TRUE — the row is silently dropped.

  • Gotcha: the failure is silent. The query doesn't error; it just hands back an empty (or truncated) result.
  • If you're sure there are no NULLs, add an explicit filter — but it's easy to forget when the data shifts:
WHERE u.id NOT IN (
  SELECT user_id FROM orders WHERE user_id IS NOT NULL
);

Cross-engine notes:

  • MySQL behaves identically — same three-valued logic, same NULL blow-up.
  • ClickHouse is also affected; on top of that its JOIN has its own quirks, so NOT IN over a guaranteed non-NULL column (or LEFT ANTI JOIN) is often cleaner. Notably, ClickHouse is one of the few engines with an explicit LEFT ANTI JOIN syntax.

Which one to pick

The short rule:

  • Default to NOT EXISTS. Expressive, NULL-safe, and the optimizer turns it into an anti join. Make it your go-to.
  • LEFT JOIN ... IS NULL — when you also want columns from the right table, or it simply reads clearer to you. Always test IS NULL on a non-nullable key.
  • NOT IN — only when the subquery column is guaranteed NOT NULL (e.g. a primary key). Otherwise filter out NULL explicitly, or just don't.

Final checklist: can the subquery return NULL? If yes, no NOT IN. Need fields from the right side? Use LEFT JOIN. In every other case, write NOT EXISTS and forget three-valued logic exists.

Practice on real tasks

Solve tasks in the SQL trainer with instant grading and hints.

Open trainer