Reading Query Plans: EXPLAIN and EXPLAIN (ANALYZE, BUFFERS)

Q: EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN only estimates the plan without running the query. EXPLAIN ANALYZE actually executes it (careful with INSERT/UPDATE/DELETE!) and attaches real numbers. A typical top line of a plan: What matters here: - cost=0.00..1834.00 — estimated cost in arbitrary units: startup cost and total cost. These are not milliseconds. - rows=120 — how many rows the planner expects . - actual time=... and actual rows=118 — what really happened. - loops=1 — how many times the node ran (crucial inside nested loops). To run EXPLAIN ANALYZE on a mutating query without side effects, wrap it in a transaction:

Q: Seq Scan vs Index Scan

A Seq Scan (sequential scan) reads the whole table row by row. That's not always bad : if you need most of the table, reading it straight through is cheaper than hopping around an index. An Index Scan uses an index to pinpoint matching rows. It wins when the predicate selects a small fraction of the table (a selective filter). There's also Bitmap Heap Scan — a middle ground: PostgreSQL collects index matches into a bitmap, then reads the table in physical-order batches. The planner picks it for "medium" selectivity — hundreds to thousands of rows. Resist the urge to "fix" the planner with hints — feed it fresh statistics instead: Gotcha: MySQL also has EXPLAIN ANALYZE (since 8.0), but the ou

Q: Estimated vs actual rows

The single most valuable skill is comparing rows (estimate) with actual rows (reality). A large gap means the planner had bad statistics and may have chosen a poor plan. If you see something like: — that's a red flag. The planner expected 5 rows and picked a Nested Loop, but got 48000. At that volume a Hash Join would have been far cheaper. Common causes of the gap: - stale statistics — fixed by ANALYZE; - correlated columns (e.g. city and country) — helped by CREATE STATISTICS; - complex expressions in the filter that have no statistics. Read estimates bottom-up : an error in a leaf node balloons through the whole join tree.

When a query suddenly crawls, guessing is a waste of time — ask the database what it actually plans to do. In PostgreSQL that's what EXPLAIN is for: it prints the execution plan the planner chose. And EXPLAIN (ANALYZE, BUFFERS) goes further, actually running the query and reporting real timings and disk access. This article walks through how to read a plan, how Seq Scan differs from Index Scan, why estimates diverge from reality, and how a plan reveals a missing index.

We'll use a small e-commerce schema: users, orders, and employees.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN only estimates the plan without running the query. EXPLAIN ANALYZE actually executes it (careful with INSERT/UPDATE/DELETE!) and attaches real numbers.

-- plan and estimates only; the query does NOT run
EXPLAIN
SELECT * FROM orders WHERE user_id = 42;

-- real execution + timings + cache/disk reads
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42;

A typical top line of a plan:

Seq Scan on orders  (cost=0.00..1834.00 rows=120 width=64)
                    (actual time=0.015..12.4 rows=118 loops=1)

What matters here:

cost=0.00..1834.00 — estimated cost in arbitrary units: startup cost and total cost. These are not milliseconds.
rows=120 — how many rows the planner expects.
actual time=... and actual rows=118 — what really happened.
loops=1 — how many times the node ran (crucial inside nested loops).

To run EXPLAIN ANALYZE on a mutating query without side effects, wrap it in a transaction:

BEGIN;
EXPLAIN (ANALYZE, BUFFERS)
DELETE FROM orders WHERE status = 'cancelled';
ROLLBACK;  -- nothing is actually deleted

Seq Scan vs Index Scan

A Seq Scan (sequential scan) reads the whole table row by row. That's not always bad: if you need most of the table, reading it straight through is cheaper than hopping around an index.

An Index Scan uses an index to pinpoint matching rows. It wins when the predicate selects a small fraction of the table (a selective filter).

-- selective: one user out of a million -> Index Scan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;

-- non-selective: half the table -> a Seq Scan is expected and fine
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'paid';

There's also Bitmap Heap Scan — a middle ground: PostgreSQL collects index matches into a bitmap, then reads the table in physical-order batches. The planner picks it for "medium" selectivity — hundreds to thousands of rows.

Resist the urge to "fix" the planner with hints — feed it fresh statistics instead:

ANALYZE orders;  -- rebuild the data distribution statistics

Gotcha: MySQL also has EXPLAIN ANALYZE (since 8.0), but the output format differs and timings are reported per node in real time. ClickHouse uses EXPLAIN PLAN / EXPLAIN PIPELINE, and has no row-level index concept — instead it has a sparse primary key and granule skipping.

Estimated vs actual rows

The single most valuable skill is comparing rows (estimate) with actual rows (reality). A large gap means the planner had bad statistics and may have chosen a poor plan.

EXPLAIN ANALYZE
SELECT *
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE u.country = 'DE' AND o.created_at > '2026-01-01';

If you see something like:

Nested Loop  (rows=5 ... actual rows=48000 loops=1)

— that's a red flag. The planner expected 5 rows and picked a Nested Loop, but got 48000. At that volume a Hash Join would have been far cheaper. Common causes of the gap:

stale statistics — fixed by ANALYZE;
correlated columns (e.g. city and country) — helped by CREATE STATISTICS;
complex expressions in the filter that have no statistics.

Read estimates bottom-up: an error in a leaf node balloons through the whole join tree.

Spotting a missing index

A missing index has a very recognizable signature: a Seq Scan over a big table with a selective predicate in Filter, where Rows Removed by Filter is huge.

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM employees WHERE email = 'anna@example.com';

Seq Scan on employees  (cost=0.00..21000 rows=1 ...)
  Filter: (email = 'anna@example.com')
  Rows Removed by Filter: 499999
  Buffers: shared read=8200

The warning signs:

a Seq Scan where you expected a single row (rows=1);
Rows Removed by Filter: 499999 — the engine scanned half the table for nothing;
Buffers: shared read=8200 — lots of blocks pulled from disk.

The fix is an index on the filtered column:

CREATE INDEX idx_employees_email ON employees (email);

After that the same plan turns into an Index Scan with Buffers: shared read=4 and microsecond timings. For queries needing only a few columns, consider a covering index:

CREATE INDEX idx_orders_user_created
  ON orders (user_id, created_at) INCLUDE (status);

Gotcha: an index won't be used if the column is wrapped in a function (WHERE lower(email) = ...) — the planner can't match it. Either build a functional index (CREATE INDEX ... ON employees (lower(email))) or take the function out of the predicate.

Make it a habit to run EXPLAIN (ANALYZE, BUFFERS) on any slow query before reaching for an index or rewriting SQL. The plan tells you exactly where the time goes — and from there it's engineering, not guesswork.

Reading Query Plans: EXPLAIN and EXPLAIN (ANALYZE, BUFFERS)

EXPLAIN vs EXPLAIN ANALYZE

Seq Scan vs Index Scan

Estimated vs actual rows

Spotting a missing index

Related articles

Practice on real tasks