HLD10 min read

Design Search Autocomplete (Google / YouTube Suggest)

Trie-backed prefix matching with personalised reranking, served at p99 < 100 ms across billions of queries/day.

hldsystem-designsearch

Intro

Search autocomplete (the dropdown under the search box) is one of the most aggressive latency targets in any consumer product. Every keystroke triggers a request; sub-100 ms feels native. The architecture is dominated by precomputed indexes, edge caching, and a small reranker that personalises without blowing the latency budget.

Functional

Return top-K (10) suggestions for any query prefix.
Personalise based on user's recent + historical queries.
Reflect trending queries within minutes (e.g. breaking news).
Respect language + region; profanity / safe-search filters.

Non-functional

p99 < 100 ms end-to-end (typing latency).
1 M QPS at peak globally.
Updated suggestion index within 10 minutes for trending.
Index ~1 B unique query strings.

Components

Suggestion service
Stateless service that hits the trie shard for the prefix.
Trie store
Per-shard prefix trie + top-K cache at each node.
Personaliser
Lightweight reranker that mixes user history into base ranking.
Aggregation pipeline
Streams query logs, builds the trie, deploys it.

Trade-offs

Trie at each prefix node vs. real-time ranking

Pros

Pre-cached top-K is O(1).
Real-time ranking captures new queries immediately.

Cons

Pre-cached lags trending.
Real-time blows the latency budget.

Scale concerns

Hot prefixes (single letters) get 100× the QPS of long prefixes.
Personalisation budget is < 5 ms — heavy models don't fit.
Trie rebuilds at 1 B strings need to be incremental.

Intro

Components

Trade-offs

Scale concerns

Related reads