Skip to content
Jarviix

Tech · 6 min read

Caching Strategies for Backend Engineers: Cache-Aside, Write-Through, and the Rest

How to actually use a cache — when to use cache-aside, write-through, write-behind, refresh-ahead — and the failure modes (thundering herd, stampede, drift) that bite in production.

By Jarviix Engineering · Apr 19, 2026

Closeup of green code on a black terminal screen
Photo via Unsplash

Caches are everywhere — between your app and database, between your CDN and origin, between your CPU and RAM. The goal is always the same: trade some staleness for a lot of latency and load reduction. The difference between a cache that helps and one that quietly serves wrong data is in the strategy and the failure handling.

This post walks through the patterns that actually work, the ones that fail in interesting ways, and the operational issues every cache eventually causes.

The five caching patterns

1. Cache-aside (lazy loading)

The most common pattern. The application explicitly checks the cache, falls back to the source on a miss, and populates the cache.

def get_user(user_id):
    cached = cache.get(f"user:{user_id}")
    if cached:
        return cached
    user = db.get_user(user_id)
    cache.set(f"user:{user_id}", user, ttl=300)
    return user

Pros. Simple. Cache failures don't break reads (just falls back to DB). Application is in full control.

Cons. First read after a cache miss pays the full cost. Cache and DB can drift if writes don't invalidate. Vulnerable to thundering herds (see below).

Use it for: read-heavy workloads where you control the read path.

2. Write-through

Writes go to both the cache and the source synchronously.

def save_user(user):
    db.save_user(user)
    cache.set(f"user:{user.id}", user, ttl=300)

Pros. Cache is always fresh. Reads after writes are immediately served from cache.

Cons. Every write pays the cost of two systems. Writes are slower. If the cache is down, writes block.

Use it for: write-light workloads where freshness matters more than write speed.

3. Write-behind (write-back)

Writes go to the cache immediately; the cache asynchronously persists to the source.

def save_user(user):
    cache.set(f"user:{user.id}", user)
    queue.publish(SaveUser(user))  # async to DB

Pros. Very fast writes. Buffer absorbs write spikes.

Cons. Risk of data loss if the cache fails before the write is persisted. Order of operations becomes complex.

Use it for: ingest-heavy workloads where some loss is acceptable (analytics events, telemetry, non-critical counters).

4. Refresh-ahead

Cache proactively refreshes entries before they expire.

def get_user(user_id):
    cached, ttl = cache.get_with_ttl(f"user:{user_id}")
    if cached and ttl < 30:
        background_refresh(user_id)
    return cached or load_and_cache(user_id)

Pros. Hot entries never expire under user traffic; cache always serves fresh.

Cons. More complex. Wastes refreshes for entries that won't be requested again.

Use it for: predictable hot keys where TTL expiry under load causes user-visible spikes.

5. Read-through

The cache itself loads from the source on miss; the application talks only to the cache.

user = cache.get(f"user:{user_id}")  # cache loads from DB if missing

Pros. Cleaner application code. The cache is the only datasource interface.

Cons. Requires a cache layer that knows how to load (some Redis modules, application-level wrappers, frameworks like Hibernate's L2 cache).

Use it for: workloads where you want the cache to be the abstraction, not just an optimization.

Cache invalidation

The hard problem. The patterns that actually work:

TTL only

Every entry expires after some time. Trade staleness for simplicity. This handles 80% of cases acceptably.

Invalidate on write

When the source changes, delete (don't update) the cached entry. The next read populates it fresh.

def update_user(user):
    db.update_user(user)
    cache.delete(f"user:{user.id}")

Why delete instead of set? If two writes interleave, "set" can leave the cache holding the older value (depending on order); "delete" forces the next read to repopulate from the latest DB state.

Versioned keys

Include a version (or generation) in the cache key. Bumping the version atomically invalidates all related entries.

v = cache.get("user_schema_version")  # e.g., 42
user = cache.get(f"user:v{v}:{user_id}")

Schema change? Bump the version. All old entries become unreachable; cache repopulates as traffic comes in.

Dependency graphs (almost never worth it)

Tracking which cached values depend on which DB rows is theoretically clean and operationally a nightmare. Only worth it for very specific high-stakes systems.

The failure modes nobody warns you about

Thundering herd

A popular cached entry expires. A thousand concurrent requests all miss simultaneously. All thousand hit the database. The database falls over.

Mitigations:

  • Probabilistic early refresh. A small fraction of requests refresh the entry before it expires.
  • Single-flight. Only one request goes to the source per missed key; the rest wait for the result.
  • Stale-while-revalidate. Serve stale data while a single request refreshes in the background.

Cache stampede on cold start

Cache is empty (after restart, deploy, eviction). Traffic hits, every request misses, source overloads.

Mitigations:

  • Warm the cache as part of deploy.
  • Use stale-if-error semantics on the layer above the cache.
  • Rate-limit DB queries from the cache layer.

Cache penetration

Requests for keys that don't exist (a user ID that doesn't exist, a malicious enumeration). Cache always misses, source is always queried.

Mitigations:

  • Cache the negative. Store "not found" with a short TTL.
  • Bloom filters in front of the cache to short-circuit known-missing keys.

Cache drift

Cache and source disagree because invalidation was missed. Subtle bugs ("why does this user keep seeing old data?").

Mitigations:

  • Short TTLs as a backstop.
  • Audit jobs that compare cache vs source for hot keys.
  • Invalidation on every write path — don't have writes that bypass the cache invalidation logic.

The cache hierarchy

A typical production stack has layers:

  1. CPU caches — invisible to you, the L1/L2/L3 caches in the processor.
  2. In-process cache — per-instance memory cache (Caffeine, Guava, lru-cache). Sub-microsecond. Lost on restart.
  3. Distributed cache — Redis, Memcached. Sub-millisecond. Shared across instances.
  4. CDN cache — full HTTP responses cached at the edge. (See how CDNs work.)
  5. Browser cache — HTTP cache headers control client-side caching.

Each layer absorbs traffic from the layer below. Cache hit rates compound: a 90% CDN hit, then 90% Redis hit on the misses, means only 1% of requests reach your origin.

Three rules

  1. Cache the right thing. Computed expensive results, frequently-read data, anything where the source is slow or fragile. Don't cache things that are already fast or that change every request.
  2. Be deliberate about staleness. TTL is a product decision, not just a tuning knob. "Users will see updates within 5 minutes" is a real product behavior; pick a TTL aligned with what's acceptable.
  3. Test the cache failure path. What happens when Redis is down? Most cache code "falls back to the source" in theory; in practice, the source can't handle the unfiltered traffic and the system dies. Load-test with the cache disabled.

Caching is one of the load-shedding tools in a system designer's kit. How CDNs work covers the layer above; Kafka explained simply covers a different pattern for absorbing spikes. The distributed cache HLD writeup walks through designing a Redis/Memcached-class service end to end — replication, eviction, sharding, consistent hashing — and is the natural deep-dive companion to this post. System design basics ties it all together.

Frequently asked questions

What's the right TTL for a cache entry?

Long enough that hit rate is high; short enough that staleness is acceptable. For most read-mostly data, 5-60 minutes is reasonable. Tune based on your real read/write ratio and how badly stale data hurts.

Should I cache database query results or full HTTP responses?

Both have a place. Full HTTP responses (in a CDN or reverse proxy) are cheaper per hit; query results in Redis give you finer-grained invalidation. Most production systems use both layers.

Is cache invalidation really one of the two hard things in computer science?

Yes, mostly because correct invalidation requires knowing the dependency graph between cached values and underlying data. Pick a strategy where you don't need perfect invalidation — TTLs + acceptable staleness is the realistic answer.

Related Jarviix tools

Read paired with the calculator that does the math.

Read next