Skip to content
Jarviix
HLD8 min read

Design a Rate Limiter

Token bucket vs. sliding window log vs. sliding window counter — and where to deploy them.

hldsystem-design

Intro

Rate limiters protect downstream services from bursty clients. The two questions are: which algorithm (it has direct user-experience consequences) and where to enforce it (edge / API gateway / service / DB).

Functional

  • Limit per-user, per-IP, per-API-key.
  • Return 429 with Retry-After header.
  • Different limits per route.

Non-functional

  • Add < 5 ms to request latency.
  • Survive Redis blip without dropping all traffic to 0.
  • Distributed: a user's quota must not duplicate across pods.

Components

  • Counter store

    Redis with Lua scripts for atomic increments.

  • Filter / interceptor

    Sits in API gateway or app middleware.

  • Config service

    Hot-reloadable rules (per route, per tier).

Trade-offs

Token bucket

Pros

  • Allows controlled bursts.
  • O(1) state per key.

Cons

  • Less precise than sliding window.

Sliding-window log

Pros

  • Exact rate enforcement.

Cons

  • O(N) memory per key — N = limit. Painful at high QPS.

Sliding-window counter (hybrid)

Pros

  • O(1) per key, ~5% accuracy error.
  • Production default.

Cons

  • Edge cases at window boundaries.

Code

Token bucket with atomic Redis Luapython
LUA = '''
local tokens = tonumber(redis.call('GET', KEYS[1]) or ARGV[1])
local last   = tonumber(redis.call('GET', KEYS[2]) or 0)
local now    = tonumber(ARGV[3])
local rate   = tonumber(ARGV[2])
tokens = math.min(tonumber(ARGV[1]), tokens + (now - last) * rate)
if tokens >= 1 then
  tokens = tokens - 1
  redis.call('SET', KEYS[1], tokens)
  redis.call('SET', KEYS[2], now)
  return 1
else return 0 end
'''

Scale concerns

  • Soft fail-open if Redis is down — a 100% block on outage is worse than the spike.
  • Per-tenant noisy neighbour: cap at the gateway, not just app layer.

Related reads