Skip to content
Jarviix
HLD9 min read

Design Top-K / Trending (real-time heavy hitters)

Approximate top-K at billions of events/day using Count-Min Sketch + heap, with multi-window leaderboards (1m, 1h, 1d).

hldsystem-designstreaming

Intro

Top-K trending (Twitter trends, YouTube trending, Spotify charts) is the canonical 'heavy-hitters at scale' problem. Naive `GROUP BY ... ORDER BY count DESC LIMIT k` doesn't scale past tens of thousands of QPS. The right answer is probabilistic data structures (Count-Min Sketch) + a small min-heap of top-K candidates per partition, merged at query time.

Functional

  • Ingest events (term, ts) at billions/day.
  • Compute top-K trending terms in last 1m / 1h / 1d windows.
  • Allow per-region / per-language slicing.
  • Query latency for current top-K under 200 ms.

Non-functional

  • Approximation error ≤ 1% on counts (CMS bounds).
  • Window staleness ≤ 1 minute.
  • Memory: bounded sketch per partition (single-digit MB).
  • Survive partition failure with bounded loss.

Components

  • Event ingester

    Validates + writes to Kafka.

  • Stream processor

    Per-partition CMS + min-heap of top-K.

  • Aggregator

    Merges per-partition heaps into global top-K.

  • Window manager

    Tumbling 1m / 1h / 1d windows.

  • Query API

    Reads merged top-K from cache.

Trade-offs

Exact (hash map) vs CMS

Pros

  • CMS: bounded memory.
  • Exact: 0% error.

Cons

  • Exact: blows up on cardinality.
  • CMS: tunable error.

Per-partition heap merge vs central heap

Pros

  • Per-partition: scales horizontally.
  • Central: simpler.

Cons

  • Per-partition: merge accuracy.
  • Central: bottleneck.

Scale concerns

  • High cardinality (billions of distinct terms).
  • Hot term dominating sketch.
  • Tail accuracy — top 100 vs top 10.
  • Late events arriving after window close.

Related reads