Design Search Autocomplete (Google / YouTube Suggest)
Trie-backed prefix matching with personalised reranking, served at p99 < 100 ms across billions of queries/day.
Intro
Search autocomplete (the dropdown under the search box) is one of the most aggressive latency targets in any consumer product. Every keystroke triggers a request; sub-100 ms feels native. The architecture is dominated by precomputed indexes, edge caching, and a small reranker that personalises without blowing the latency budget.
Functional
- Return top-K (10) suggestions for any query prefix.
- Personalise based on user's recent + historical queries.
- Reflect trending queries within minutes (e.g. breaking news).
- Respect language + region; profanity / safe-search filters.
Non-functional
- p99 < 100 ms end-to-end (typing latency).
- 1 M QPS at peak globally.
- Updated suggestion index within 10 minutes for trending.
- Index ~1 B unique query strings.
Components
Suggestion service
Stateless service that hits the trie shard for the prefix.
Trie store
Per-shard prefix trie + top-K cache at each node.
Personaliser
Lightweight reranker that mixes user history into base ranking.
Aggregation pipeline
Streams query logs, builds the trie, deploys it.
Trade-offs
Trie at each prefix node vs. real-time ranking
Pros
- Pre-cached top-K is O(1).
- Real-time ranking captures new queries immediately.
Cons
- Pre-cached lags trending.
- Real-time blows the latency budget.
Scale concerns
- Hot prefixes (single letters) get 100× the QPS of long prefixes.
- Personalisation budget is < 5 ms — heavy models don't fit.
- Trie rebuilds at 1 B strings need to be incremental.