Skip to content
Jarviix

Tech · 5 min read

Database Indexing Explained: B-trees, Hash Indexes, and When to Add One

How database indexes actually work — B-trees vs hash, covering and partial indexes, the cost they impose on writes, and a practical rulebook for when to add one.

By Jarviix Engineering · Apr 19, 2026

Glowing green network LEDs on a darkened server rack
Photo via Unsplash

Indexing is one of the highest-leverage skills in backend engineering. The difference between a query that runs in 2ms and one that runs in 2 seconds is almost always an index — added, missing, or used wrong.

This post covers what indexes actually are, the trade-offs they impose, and a practical rulebook for when to add one.

What an index is, mechanically

An index is a separate data structure that maps values in a column to the row(s) they point to. The query planner uses it to skip scanning the entire table.

The two most common physical structures:

B-tree (the default everywhere)

A balanced tree of sorted keys. Lookups, range scans, and ordered traversal are all O(log N). Almost every database — Postgres, MySQL/InnoDB, Oracle, SQL Server — defaults to B-trees (or B+ trees, the variant where leaf nodes are linked).

What B-trees are good at:

  • Equality: WHERE id = 42
  • Ranges: WHERE created_at > '2024-01-01'
  • Sorted reads: ORDER BY created_at LIMIT 50
  • Prefix matches: WHERE name LIKE 'Sam%'

What they're not great at:

  • Suffix or substring search: WHERE name LIKE '%son' — can't use the index.
  • Very low cardinality columns (a gender column with 2 values rarely benefits).

Hash index

A hash table mapping value → row pointer. O(1) for equality lookups, but useless for ranges or ordering.

Hash indexes are an option in Postgres and a few others; in practice, B-trees are usually fine even for equality, so hash indexes are rarely worth the special-case.

Other structures (good to know they exist)

  • GIN (Generalized Inverted Index, Postgres): for arrays, JSONB, full-text search. The right tool for "does this JSON document contain this key?".
  • GiST: geometric, geospatial, and weird custom ordering. Backbone of PostGIS.
  • BRIN (Block Range Index): tiny, low-precision indexes useful for very large append-only time-series tables.
  • LSM trees (Cassandra, RocksDB): not exactly an index, but the underlying storage structure that powers many NoSQL stores.

For 95% of OLTP workloads on Postgres or MySQL, the right answer is B-tree. Don't overthink it.

Composite indexes: the rule that matters

A composite index covers multiple columns. The order matters — a lot.

CREATE INDEX idx_orders ON orders (customer_id, created_at, status);

This index is useful for:

  • WHERE customer_id = 42
  • WHERE customer_id = 42 AND created_at > '2024-01-01'
  • WHERE customer_id = 42 AND created_at > '2024-01-01' AND status = 'paid'

But not for:

  • WHERE created_at > '2024-01-01' (skips the leading column)
  • WHERE status = 'paid' (skips the leading two columns)

The rule: the index is useful for any query that uses a left-anchored prefix of its columns.

Pick the column order based on:

  1. Equality columns first.
  2. Range columns next.
  3. Sort columns aligned with the trailing columns where possible.

Covering indexes

If the index includes all the columns a query needs (both for filtering and for the SELECT list), the database can return results from the index alone — no table lookup ("index-only scan").

CREATE INDEX idx_users_email_name ON users (email) INCLUDE (name);
SELECT name FROM users WHERE email = 'a@b.com';  -- no table read

Covering indexes are an enormous win for hot read paths. Cost: more disk + slower writes (every write updates the index).

Partial indexes

An index with a WHERE clause — only certain rows are indexed.

CREATE INDEX idx_orders_pending ON orders (customer_id) WHERE status = 'pending';

If only 1% of orders are pending, this index is 100× smaller than a full index — and queries filtering on pending orders can use it directly.

This is one of the most underused features in Postgres. If your hot query has a discriminating predicate (a status, a flag, a tenant), a partial index is often a game-changer.

The cost of indexes

Every index has three costs:

  1. Disk. A B-tree on a few columns can be 10–30% the size of the table itself. Three indexes on a 100GB table → 30GB of extra storage.
  2. Write amplification. Every INSERT/UPDATE/DELETE has to update every index that covers the affected columns. Tables with 10 indexes write 10× the data.
  3. Optimizer overhead. More indexes = more options for the planner to consider on every query. Usually negligible, occasionally not.

The art of indexing is buying just enough indexes to make reads fast, without making writes painful.

When to add an index

A practical decision flow:

  1. Run EXPLAIN ANALYZE on the slow query. Do you see a Seq Scan on a large table where you're filtering down to a small percentage?
  2. What's the selectivity of the predicate? If the WHERE clause matches 50% of rows, an index probably won't help — sequential scan is cheaper.
  3. Is the column high-cardinality? Indexes on columns with 2-3 distinct values rarely earn their keep.
  4. Could a partial or covering index do better? Often yes.
  5. Are there existing indexes the query could use with a different column order? Check before adding new ones.

The right indexes are usually obvious in hindsight. The wrong ones accumulate over years.

When to remove an index

Less talked about, equally important.

  • Duplicate indexes. (a, b) and (a) — the first is a superset; the second can probably go.
  • Unused indexes. Postgres's pg_stat_user_indexes shows which indexes have never been used. Audit quarterly; drop ones with zero hits.
  • Indexes that the planner ignores. Sometimes you create one and the optimizer never picks it. If EXPLAIN ANALYZE shows the planner choosing a different plan, the index is just write overhead.

Three real-world traps

A short list of indexing mistakes I've made or seen:

  • Indexing every column "just in case". Becomes a write-amplification nightmare. Indexes are not free.
  • Forgetting to update statistics. After a bulk load, run ANALYZE. Otherwise the planner has stale row counts and picks bad plans.
  • Indexing on functions without expression indexes. WHERE LOWER(email) = ? won't use an index on email. Either index LOWER(email) directly, or store the lowercased copy.

Indexes are half of database performance; the other half is understanding isolation levels — what concurrent transactions actually see. The Instagram HLD writeup is one of the clearest worked examples of where indexing strategy (composite, partial, covering) makes the difference between feed reads in milliseconds and full-table scans.

Frequently asked questions

Why is my query still slow even after I added an index?

Either the planner isn't using it (run EXPLAIN ANALYZE), the index doesn't cover the WHERE clause, the column isn't selective enough, or the index is fragmented. Indexes don't help if the optimizer thinks a sequential scan is cheaper.

How many indexes are too many?

When the write-amplification cost exceeds the read benefit. For OLTP tables with heavy writes, more than 4-6 indexes per table is usually a sign of accumulated tech debt; for OLAP/read-heavy tables, you can have many more.

Should I index foreign keys?

Almost always yes. Postgres doesn't auto-index foreign keys, and joins on unindexed FK columns will sequential-scan. MySQL/InnoDB does auto-index them. Either way, verify.

Related Jarviix tools

Read paired with the calculator that does the math.

Read next