HLD8 min read
Design a Rate Limiter
Token bucket vs. sliding window log vs. sliding window counter — and where to deploy them.
hldsystem-design
Intro
Rate limiters protect downstream services from bursty clients. The two questions are: which algorithm (it has direct user-experience consequences) and where to enforce it (edge / API gateway / service / DB).
Functional
- Limit per-user, per-IP, per-API-key.
- Return 429 with Retry-After header.
- Different limits per route.
Non-functional
- Add < 5 ms to request latency.
- Survive Redis blip without dropping all traffic to 0.
- Distributed: a user's quota must not duplicate across pods.
Components
Counter store
Redis with Lua scripts for atomic increments.
Filter / interceptor
Sits in API gateway or app middleware.
Config service
Hot-reloadable rules (per route, per tier).
Trade-offs
Token bucket
Pros
- Allows controlled bursts.
- O(1) state per key.
Cons
- Less precise than sliding window.
Sliding-window log
Pros
- Exact rate enforcement.
Cons
- O(N) memory per key — N = limit. Painful at high QPS.
Sliding-window counter (hybrid)
Pros
- O(1) per key, ~5% accuracy error.
- Production default.
Cons
- Edge cases at window boundaries.
Code
Token bucket with atomic Redis Luapython
LUA = '''
local tokens = tonumber(redis.call('GET', KEYS[1]) or ARGV[1])
local last = tonumber(redis.call('GET', KEYS[2]) or 0)
local now = tonumber(ARGV[3])
local rate = tonumber(ARGV[2])
tokens = math.min(tonumber(ARGV[1]), tokens + (now - last) * rate)
if tokens >= 1 then
tokens = tokens - 1
redis.call('SET', KEYS[1], tokens)
redis.call('SET', KEYS[2], now)
return 1
else return 0 end
'''
Scale concerns
- Soft fail-open if Redis is down — a 100% block on outage is worse than the spike.
- Per-tenant noisy neighbour: cap at the gateway, not just app layer.