When vertical scaling is no longer cost-effective, when working set exceeds RAM, or when a single primary's write throughput is saturated. Almost every team shards too early.

Can I just use a managed sharded database (Vitess, Citus, Spanner)?

Often yes — and you should. The hardest part of sharding isn't the algorithm; it's the operational tooling. Managed solutions let you skip building most of it.

How do I do joins across shards?

Either co-locate joined data on the same shard (by sharding both tables on the same key), denormalize, or do application-side fan-out and join. Cross-shard joins at the database layer are a known pain point.

Sharding and Partitioning: Strategies, Trade-offs, and the Pain Nobody Warns You About

Sharding is one of the highest-stakes architectural decisions in a system's life. Done early, it costs you years of unnecessary complexity. Done late, it's a multi-quarter migration during peak hours.

This post covers when to shard, how to shard, and the operational realities most articles skip.

What sharding actually is

Partitioning is splitting a single logical dataset into multiple physical pieces. Sharding is partitioning across multiple machines (or database instances). The terms get used interchangeably, but the distinction matters: partitioning a table within one database is much easier than splitting across servers.

The goal is the same: each subset of the data is small enough to fit, fast enough to serve, and independent enough to operate on without coordinating with the others.

When to shard

Almost no one shards too late once they understand what it costs. Most shard too early, then spend years living with the complexity for performance they never needed.

Triggers that justify sharding:

Working set exceeds RAM. Page faults dominate. Adding RAM is no longer cost-effective or possible.
Write throughput saturated on the primary. Read replicas help reads, not writes. If your insert rate is bottlenecked by single-primary IOPS or replication lag, you need to spread writes.
Operational windows too long. Backup, restore, schema change all scale with table size. Once a single backup takes 12 hours, you're going to want smaller pieces.

Things that are not good triggers:

"We might have lots of users one day."
"Our current database is 200 GB" (Postgres handles tens of TB on modest hardware).
"Cassandra is web-scale."

Buy as much vertical scale as you can first. It's almost always cheaper than sharding.

Sharding strategies

The choice of shard key (and how you map keys to shards) is the most consequential decision in any sharded system.

Hash sharding

Hash the shard key, mod by the number of shards.

shard = hash(user_id) % N

Pros. Even distribution, no hot spots from sequential keys.

Cons. Range scans (WHERE user_id BETWEEN ...) are scatter-gather across all shards. Resharding (changing N) means re-hashing nearly everything.

Range sharding

Split by ranges of the shard key.

shard 1: user_id 1 - 1,000,000
shard 2: user_id 1,000,001 - 2,000,000
...

Pros. Range queries are fast — they hit a single shard. Adding capacity = adding a new range.

Cons. Hot spots when load is concentrated in a particular range (e.g., recently registered users get all the writes if user_id is monotonic).

Directory / lookup sharding

A small lookup service maps each key to its shard.

shard_for(user_id) → query lookup table → "shard 7"

Pros. Maximum flexibility. Move individual keys (or tenants) between shards as needed.

Cons. The lookup service is a new dependency; you must keep it consistent and fast.

Geographic / tenant sharding

Shard by user region or by tenant ID. Common in B2B SaaS — "each customer has their own shard" gives strong isolation, simple data residency stories, and cleaner blast radius.

Consistent hashing

Hash sharding's smarter cousin. Keys and shards are placed on a ring; each key goes to the next shard clockwise. Adding a shard moves only ~1/N of keys instead of nearly all of them.

This is what Cassandra, DynamoDB, and most modern distributed key-value stores use.

Picking a shard key

The shard key is the column you partition on. Pick it badly and you're stuck.

Good shard keys are:

High cardinality. Lots of distinct values, so distribution is even.
Reasonably uniform in access. No single value gets 50% of the traffic ("hot key" — see below).
Aligned with your queries. Queries with the shard key in the WHERE clause hit one shard; queries without it hit them all.
Stable. A user's shard shouldn't change over time — that means migrating data.

For most consumer apps, user_id is a defensible default. For B2B, tenant_id (or account_id) is often better — every query naturally carries a tenant scope, and tenant-level isolation is operationally useful.

The hot key problem

The single biggest operational pain in sharding. One key (one user, one tenant, one trending topic) gets 1000× the average traffic. Its shard saturates. The rest of the cluster is idle.

Mitigations:

Sub-shard the hot key. Append a small random suffix (user_42:0, user_42:1, ...) and fan-out reads. Works for write-amplification cases.
Separate the hot key onto dedicated infrastructure. Pull the noisy tenant out into their own shard.
Caching aggressively for the hot key. Often the hot read traffic can be absorbed in Redis before it reaches the database.

There's no clean fix; hot keys are something you spend operational attention on forever.

Resharding: the part nobody warns you about

Eventually you outgrow your shard count. Now you have to move data.

In a hash-sharded system without consistent hashing, this is brutal: change N from 16 to 32 and ~50% of all keys move. You need:

A way to do dual writes during the migration (write to both old and new shards).
A way to backfill historical data.
A way to switch reads atomically (or with controlled inconsistency).
A way to clean up old shards once you're confident.

Plan to spend weeks on this. Plan to do it during off-peak. Plan to rehearse on a copy.

The escape: design for resharding from day one. Use consistent hashing. Use directory sharding. Or use a managed sharded database (Vitess for MySQL, Citus for Postgres, Spanner, CockroachDB, Yugabyte) that handles resharding under the hood.

Cross-shard operations

Three painful categories:

Cross-shard joins. Either co-locate joined tables on the same shard (shard users and orders both by user_id), denormalize, or do app-side join (slow, complex).
Distributed transactions. Two-phase commit across shards is slow and brittle. Most sharded systems avoid them — use sagas, idempotency, and eventual consistency instead.
Aggregations. COUNT(*) across all shards becomes scatter-gather. For analytics, replicate to a separate analytics store designed for it.

Designing your data model so that 95% of queries hit a single shard is the difference between sharding being a performance win and being a constant tax.

Should you build it yourself?

Almost always: no.

The list of things you'll need to build if you roll your own:

Shard routing layer (which shard does this query go to?).
Connection pooling per shard.
Schema change orchestration across shards.
Backup/restore coordination.
Monitoring per shard.
Resharding tooling.
Dual-write infrastructure for migrations.

Vitess, Citus, CockroachDB, Yugabyte, Spanner, DynamoDB — every one of them has solved (most of) this. Pay them in money or open-source contributions instead of engineer-years.

What to read next

Sharding is one part of a larger scaling story. Eventual consistency explains the consistency trade-offs you accept once data spans nodes; CAP theorem in practice covers the theory; system design basics ties them together. The Twitter HLD writeup is a worked example of choosing a shard key (user_id) and living with its consequences (hot accounts, cross-shard timelines, fanout).