Skip to content
Jarviix
HLD10 min read

Design Search Autocomplete (Google / YouTube Suggest)

Trie-backed prefix matching with personalised reranking, served at p99 < 100 ms across billions of queries/day.

hldsystem-designsearch

Intro

Search autocomplete (the dropdown under the search box) is one of the most aggressive latency targets in any consumer product. Every keystroke triggers a request; sub-100 ms feels native. The architecture is dominated by precomputed indexes, edge caching, and a small reranker that personalises without blowing the latency budget.

Functional

  • Return top-K (10) suggestions for any query prefix.
  • Personalise based on user's recent + historical queries.
  • Reflect trending queries within minutes (e.g. breaking news).
  • Respect language + region; profanity / safe-search filters.

Non-functional

  • p99 < 100 ms end-to-end (typing latency).
  • 1 M QPS at peak globally.
  • Updated suggestion index within 10 minutes for trending.
  • Index ~1 B unique query strings.

Components

  • Suggestion service

    Stateless service that hits the trie shard for the prefix.

  • Trie store

    Per-shard prefix trie + top-K cache at each node.

  • Personaliser

    Lightweight reranker that mixes user history into base ranking.

  • Aggregation pipeline

    Streams query logs, builds the trie, deploys it.

Trade-offs

Trie at each prefix node vs. real-time ranking

Pros

  • Pre-cached top-K is O(1).
  • Real-time ranking captures new queries immediately.

Cons

  • Pre-cached lags trending.
  • Real-time blows the latency budget.

Scale concerns

  • Hot prefixes (single letters) get 100× the QPS of long prefixes.
  • Personalisation budget is < 5 ms — heavy models don't fit.
  • Trie rebuilds at 1 B strings need to be incremental.

Related reads