Design a URL Shortener
Bit.ly-class system: Base62 IDs, key-value store, edge cache, ~10× read/write skew.
Intro
URL shorteners (Bit.ly, TinyURL, goo.gl) take a long URL and return a short alias. The system is read-heavy (≈10:1) and latency-sensitive — the redirect is on the user's critical path.
Functional
- POST /shorten {url} → short id (6–8 chars).
- GET /{shortId} → 301/302 redirect to original.
- Optional custom alias and TTL.
Non-functional
- Read latency p95 < 50 ms.
- Availability > 99.95%.
- Estimate: 100 M new URLs/month, 10× read skew → 1 B reads/month ≈ 400 read QPS.
- Storage: 100 M × 12 months × 5 yrs × ~500 B = ~3 TB.
Components
API gateway
TLS termination, auth, basic rate-limit.
Write service
Validates URL, generates id, persists.
Read service
Resolves id → URL; serves redirect.
ID generator
Base62 (~62⁷ ≈ 3.5 trillion ids); a counter per partition + Snowflake-style timestamp.
Primary store
Key-value: DynamoDB / Cassandra / Bigtable.
Cache
Redis cluster in front of reads — keeps p95 low.
CDN
302 cached at the edge for the hottest URLs.
Analytics pipeline
Click events → Kafka → warehouse.
Trade-offs
ID generation: counter vs. hash
Pros
- Counter is dense and predictable.
- Hash is stateless and globally unique.
Cons
- Counter needs a coordination service (ZK / Snowflake).
- Hashes can collide → must check + retry.
Storage: relational vs. wide-column
Pros
- Relational has familiar ops.
- Wide-column scales horizontally without resharding.
Cons
- Relational hits write ceiling at scale.
- Wide-column gives up secondary indexes / joins.
Code
ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
def encode(n):
if n == 0: return ALPHABET[0]
out = []
while n:
n, r = divmod(n, 62)
out.append(ALPHABET[r])
return ''.join(reversed(out))
Scale concerns
- Hot keys (viral links) — solve with edge caching, request collapsing.
- Cache stampede on expiry — single-flight or jittered TTL.
- DDOS on /shorten — strict per-IP and per-token rate limits.