Indexing JSONB with GIN: jsonb_ops vs jsonb_path_ops

Q: jsonb_ops vs jsonb_path_ops

GIN over JSONB has two operator classes: - jsonb ops (the default) indexes every key and every value as a separate entry. It supports more operators but the index is larger. - jsonb path ops indexes hashes of whole "key - value" paths. It is smaller, usually faster on containment checks, but supports only a narrow set of operators. Which operators each class supports: - @ (contains) is supported by both classes. - @?, @@ (JSONPath) are also supported by both classes. - ?, ? , ?& (key existence) are only in jsonb ops. So the real dividing line is key-existence: only the default jsonb ops can answer ?, ? and ?& from the index. If every query you run is built on @ or JSONPath, choose jsonb path

Q: How other engines differ

- MySQL has no GIN. For JSON you use GENERATED columns with an ordinary B-tree on top, or multi-valued indexes for arrays via MEMBER OF and JSON CONTAINS. - ClickHouse leans on its JSON/Map types and sparse, granule-level indexes rather than inverted ones; you usually speed up filters with materialized columns or skip indexes. For PostgreSQL the baseline recipe is: @ filters - jsonb path ops, point equality - expression index, and always confirm with EXPLAIN.

A jsonb column is convenient right up until you have to filter on it at scale: without an index every query becomes a sequential scan that parses JSON on the fly. A GIN index fixes that, but it ships with two operator classes that behave differently, and the choice affects both speed and size.

Do not reach for the "most powerful" index just in case. GIN speeds up reads, but you pay for it in size, slower inserts, and periodic maintenance. So pin down the real predicates first: are you searching by the shape of the document, by the existence of a key, or by a single scalar field?

Why GIN at all

Say we keep events with a JSONB payload:

CREATE TABLE events (
    id         bigint PRIMARY KEY,
    user_id    bigint NOT NULL,
    payload    jsonb  NOT NULL,
    created_at timestamptz DEFAULT now()
);

A B-tree on that column is useless: it compares whole values, but we need to look inside the document. GIN (Generalized Inverted Index) stores an inverted index of the keys and values inside the JSON, so "find documents that contain this" runs against the index.

CREATE INDEX idx_events_payload ON events USING GIN (payload);
SELECT id FROM events WHERE payload @> '{"status": "paid"}';

jsonb_ops vs jsonb_path_ops

GIN over JSONB has two operator classes:

jsonb_ops (the default) indexes every key and every value as a separate entry. It supports more operators but the index is larger.
jsonb_path_ops indexes hashes of whole "key -> value" paths. It is smaller, usually faster on containment checks, but supports only a narrow set of operators.

-- default class: jsonb_ops
CREATE INDEX idx_payload_ops ON events USING GIN (payload);

-- compact class optimized for containment
CREATE INDEX idx_payload_path ON events USING GIN (payload jsonb_path_ops);

Which operators each class supports:

@> (contains) is supported by both classes.
@?, @@ (JSONPath) are also supported by both classes.
?, ?|, ?& (key existence) are only in jsonb_ops.

So the real dividing line is key-existence: only the default jsonb_ops can answer ?, ?| and ?& from the index. If every query you run is built on @> or JSONPath, choose jsonb_path_ops: it covers all of them and is more compact, since it skips indexing bare keys you never search by.

Expression index vs the whole document

You do not always need to index the whole document. If you constantly filter on one scalar field, a B-tree on that expression is cheaper:

CREATE INDEX idx_events_status
    ON events ((payload ->> 'status'));

SELECT id FROM events WHERE payload ->> 'status' = 'refunded';

This index is tiny and supports ranges and ordering. Joining it to your other relational tables is just as easy:

SELECT u.email, e.created_at
FROM events e
JOIN users u ON u.id = e.user_id
WHERE e.payload ->> 'status' = 'paid'
  AND u.country = 'DE';

The rule of thumb: single-field equality or range filters -> expression index; arbitrary filters across many keys, especially @> -> GIN on the whole document.

Gotchas and reading the plan

->> skips GIN. The predicate payload ->> 'status' = 'paid' does NOT use a GIN index; GIN needs payload @> '{"status":"paid"}'. This is the single most common mistake.
jsonb_path_ops does not know ?. Key existence without a value (payload ? 'coupon') is only indexed by the default class.
GIN is expensive to write. Every INSERT updates many index entries. Under heavy insert load, fastupdate plus regular VACUUM helps.
Selectivity. GIN pays off when the predicate discards most rows. If @> matches almost everything, the planner will sensibly pick a seq scan.

Always check the plan:

EXPLAIN ANALYZE
SELECT id FROM events
WHERE payload @> '{"status": "paid", "channel": "web"}';

Look for a Bitmap Index Scan on the index you expect; that confirms GIN is actually used.

How other engines differ

MySQL has no GIN. For JSON you use GENERATED columns with an ordinary B-tree on top, or multi-valued indexes for arrays via MEMBER OF and JSON_CONTAINS.
ClickHouse leans on its JSON/Map types and sparse, granule-level indexes rather than inverted ones; you usually speed up filters with materialized columns or skip indexes.

For PostgreSQL the baseline recipe is: @> filters -> jsonb_path_ops, point equality -> expression index, and always confirm with EXPLAIN.