Design Top-K / Trending (real-time heavy hitters)
Approximate top-K at billions of events/day using Count-Min Sketch + heap, with multi-window leaderboards (1m, 1h, 1d).
Intro
Top-K trending (Twitter trends, YouTube trending, Spotify charts) is the canonical 'heavy-hitters at scale' problem. Naive `GROUP BY ... ORDER BY count DESC LIMIT k` doesn't scale past tens of thousands of QPS. The right answer is probabilistic data structures (Count-Min Sketch) + a small min-heap of top-K candidates per partition, merged at query time.
Functional
- Ingest events (term, ts) at billions/day.
- Compute top-K trending terms in last 1m / 1h / 1d windows.
- Allow per-region / per-language slicing.
- Query latency for current top-K under 200 ms.
Non-functional
- Approximation error ≤ 1% on counts (CMS bounds).
- Window staleness ≤ 1 minute.
- Memory: bounded sketch per partition (single-digit MB).
- Survive partition failure with bounded loss.
Components
Event ingester
Validates + writes to Kafka.
Stream processor
Per-partition CMS + min-heap of top-K.
Aggregator
Merges per-partition heaps into global top-K.
Window manager
Tumbling 1m / 1h / 1d windows.
Query API
Reads merged top-K from cache.
Trade-offs
Exact (hash map) vs CMS
Pros
- CMS: bounded memory.
- Exact: 0% error.
Cons
- Exact: blows up on cardinality.
- CMS: tunable error.
Per-partition heap merge vs central heap
Pros
- Per-partition: scales horizontally.
- Central: simpler.
Cons
- Per-partition: merge accuracy.
- Central: bottleneck.
Scale concerns
- High cardinality (billions of distinct terms).
- Hot term dominating sketch.
- Tail accuracy — top 100 vs top 10.
- Late events arriving after window close.
Related reads
HLD
Design a Distributed Counter
Sharded counters at 1 M+ writes/sec with bounded staleness reads, used for view counts, like counts, and rate aggregations.
HLD
Design Ad Click Aggregation
Sub-minute aggregation of billions of click events with idempotent dedup, fraud filtering, and real-time + batch reconciliation.