Skip to content
Jarviix

Tech · 7 min read

Database Sharding Explained: When, Why, and How to Do It Right

Sharding is the most common — and most misunderstood — way to scale databases. The strategies, the trade-offs, and the cases where you absolutely should not shard.

By Jarviix Engineering · Apr 19, 2026

Database servers in data center
Photo via Unsplash

Database sharding is one of those topics where the conventional wisdom — "shard your database to scale!" — leads to most teams making expensive mistakes. Sharding is a sledgehammer that solves real scaling problems but introduces operational complexity that compounds over years.

This post covers when sharding is actually justified, the major sharding strategies and their trade-offs, and the engineering decisions that determine whether your sharded system stays maintainable or becomes a quagmire.

What sharding actually is

Sharding splits a logical database across multiple physical machines (shards). Each shard contains a subset of the total data, distinguished by a sharding key. Queries route to the appropriate shard based on the key.

For example, sharding by user_id with 4 shards:

  • Shard 0: user_ids 0-249,999
  • Shard 1: user_ids 250,000-499,999
  • Shard 2: user_ids 500,000-749,999
  • Shard 3: user_ids 750,000+

A query for user 327,489 routes to Shard 1.

This achieves three things:

  1. Horizontal scaling: write load splits across machines
  2. Data locality: data for related queries lives on the same machine
  3. Failure isolation: an outage on one shard doesn't take down the entire system

It also introduces:

  1. Distributed query complexity: cross-shard queries become hard
  2. Operational overhead: 4 databases to manage instead of 1
  3. Resharding pain: changing shard topology requires significant data movement

The pre-sharding scaling ladder

Before sharding, exhaust simpler options:

1. Vertical scaling

Buy a bigger machine. Modern cloud instances offer 96 vCPUs, 1.5TB RAM, NVMe SSDs. PostgreSQL on a beefy instance can handle 50,000+ TPS for many workloads. The simplicity advantage is enormous.

2. Read replicas

Add 2-5 read replicas. Route reads to replicas, writes to primary. This handles 80% of scaling needs because most workloads are read-heavy. Eventual consistency on replicas is usually acceptable.

3. Query optimization

Profile slow queries with EXPLAIN ANALYZE. Add missing indexes. Rewrite N+1 query patterns. A 10x query speed improvement is often equivalent to 10x more hardware.

4. Caching

Add Redis/Memcached for frequently-accessed data. A cache hit is 100x faster than a database query. Caching often eliminates 70-90% of read load.

5. Schema denormalization

Pre-compute aggregates, join tables, materialized views. Trade some write complexity for dramatic read efficiency.

6. Connection pooling and database-level optimizations

PgBouncer, statement-level config tuning, autovacuum tuning. These can yield 2-5x throughput on existing hardware.

Only after these are exhausted should you consider sharding.

Sharding strategies

Range-based sharding

Data is partitioned by ranges of the shard key. Example: user_id 0-1M on shard 1, 1M-2M on shard 2, etc.

Pros:

  • Simple to understand
  • Range queries within a shard are efficient
  • Easy to provision new shards by adding range ends

Cons:

  • Hot shards (recent users get all the action)
  • Manual rebalancing if data is uneven
  • New writes concentrate on the latest shard

When to use: Time-series data, append-only workloads, or when range queries are common and even distribution is achievable.

Hash-based sharding

Apply a hash function to the shard key, modulo the number of shards. Example: shard = hash(user_id) % 4.

Pros:

  • Even data distribution
  • No hotspots from sequential keys
  • Predictable shard count

Cons:

  • Range queries require fan-out to all shards
  • Adding/removing shards requires resharding (every key may change shard)
  • Cross-shard queries are common

When to use: User-centric workloads where evenness matters more than range queries.

Consistent hashing

A hash-based variant that minimizes data movement when adding/removing shards. Each shard owns a slice of the hash ring; only data near the new shard's slice moves.

Pros:

  • Adding shards is much cheaper than naive hash sharding
  • Failure of one shard affects only its slice
  • Used by Cassandra, DynamoDB, Redis Cluster

Cons:

  • Slightly more complex to implement and debug
  • Hot spots possible if hash distribution is uneven
  • Requires virtual nodes for balanced load

When to use: When dynamic resharding is a likely operational need.

Directory-based sharding

A lookup service maps shard keys to physical shards. Total flexibility — each key can be on any shard.

Pros:

  • Maximum flexibility
  • Easy to move individual users/tenants between shards
  • Custom routing logic possible (e.g., big customers on dedicated shards)

Cons:

  • Lookup service is a critical dependency
  • Lookup latency on every query
  • Lookup service must be highly available

When to use: Multi-tenant applications where tenants vary dramatically in size, or where you need tenant-level shard isolation.

Geographic / locality-based sharding

Data is sharded by user location to minimize cross-region latency. Example: EU users on EU shards, US users on US shards.

Pros:

  • Compliance with data residency laws (GDPR)
  • Lower latency for users
  • Failure isolation by geography

Cons:

  • Cross-geography queries expensive
  • Complex if users move regions
  • Doubled-up infrastructure costs per region

When to use: Global applications with regulatory requirements or strong locality patterns.

Choosing a shard key

The shard key determines almost everything about your sharding implementation. Bad shard keys cause most sharding failures.

A good shard key:

  • High cardinality: many distinct values to distribute across shards
  • Even distribution: no hot keys (avoid sharding by country if 80% are in one country)
  • Used in most queries: minimizes cross-shard fan-out
  • Stable: doesn't change frequently (re-sharding moves data)

Bad shard keys:

  • Boolean fields: only 2 values, only 2 shards possible
  • Timestamps for active workloads: latest shard becomes hot
  • Sequential IDs without hashing: writes concentrate on latest shard
  • Mutable fields: changes require data migration

Common good choices:

  • user_id (for user-centric apps)
  • tenant_id (for B2B SaaS)
  • hash(user_id) (for even distribution)

Cross-shard queries: the hidden tax

Once sharded, queries that span shards become expensive:

  • Aggregations across all users (must query every shard)
  • Joins between users on different shards (rarely possible without expensive scatter-gather)
  • Global counts/sums (require fan-out + aggregation)

Strategies to mitigate:

  • Denormalize across shards: duplicate data so queries can be answered from one shard
  • Pre-compute aggregates: maintain a separate, non-sharded analytics database
  • Avoid cross-shard joins: redesign data model so related data is co-located
  • Accept fan-out for rare queries: some queries running slower is acceptable for systems where they're infrequent

The discipline: design your sharding so 95%+ of queries hit a single shard.

Resharding: the inevitable pain

Eventually you'll need to:

  • Add shards as data grows
  • Remove shards if traffic shrinks
  • Rebalance hot shards
  • Change shard key (worst case)

Resharding requires:

  1. Provisioning new shards
  2. Copying data from old shards to new
  3. Updating routing logic
  4. Cutover (often with downtime or careful coordination)
  5. Decommissioning old shards

This is one of the most operationally expensive activities in distributed systems. Many companies plan their initial shard count to be 4-10x current needs to delay resharding for years.

Common mistakes

  • Sharding too early: paying operational tax for years before scale justifies it
  • Bad shard key: data ends up unevenly distributed, hot shards form
  • Cross-shard joins everywhere: queries fan out, performance worsens after sharding
  • No resharding plan: when growth forces resharding, no playbook exists
  • Single point of failure in routing: directory service goes down, entire system unreachable
  • Inconsistent transactions across shards: distributed transactions are slow and complex; many teams underestimate this

Sharding is a powerful tool but a heavy one. The best sharding decisions come from teams that have first exhausted simpler scaling options, then sharded with a clear understanding of the trade-offs they're accepting. Premature sharding burdens engineering teams for years; well-executed sharding solves real scale problems durably.

Frequently asked questions

When do I actually need to shard?

Almost never as your first scaling step. Try in order: vertical scaling (bigger machine), read replicas, query optimization, schema denormalization, caching, then sharding. Sharding is appropriate when (a) a single primary cannot handle write load even after optimization, (b) data volume exceeds what one node can store, or (c) latency requirements demand co-location of user data. Most teams shard 2-3 years too early and pay a heavy operational tax for years.

What's the difference between sharding and partitioning?

Often used interchangeably but technically different. Partitioning is splitting a single table into smaller pieces — could be on the same machine (e.g., PostgreSQL table partitions). Sharding is partitioning data across multiple physical machines, each running its own database. Sharding always implies partitioning; partitioning doesn't always mean sharding.

Can I un-shard if it was a mistake?

Possible but painful. Consolidating shards involves data migration, schema reconciliation, and downtime risks. Plan for the possibility from day one — keep cross-shard queries minimal, don't rely on shard-specific behaviors, and document migration paths. Many companies have un-sharded successfully (e.g., when a previously growth-justified shard count became unnecessary after architectural improvements), but it requires the same engineering investment as the original sharding.

Related Jarviix tools

Read paired with the calculator that does the math.

Read next