HLD prep is optimised for larger screens
Open on a laptop or desktop (1024px+) for the diagram canvas and the section sidebar side-by-side.
Problem Understanding
Restate the problem in your own words.
Design a Web Crawler (Googlebot-class)
Design a polite, distributed web crawler: fetch a seed set of pages, follow links to a frontier of billions of URLs, parse content for indexing, and respect robots.txt + per-domain crawl rate. The hard parts are URL deduplication at frontier scale (Bloom filter + datastore), parallel fetch without overloading any single domain, and detecting + skipping spider traps. The throughput target — 1B+ pages/day — turns the design into a streaming pipeline with strict politeness back-pressure per host.
- GooglebotThe canonical web crawler. Trillions of URLs in the frontier; PageRank + freshness drive scheduling.
- BingbotMicrosoft’s crawler — same problem, slightly different politeness defaults.
- Common CrawlOpen-source non-profit crawl, monthly ~3B-page snapshots used for ML training.
- Internet Archive ArchiveBotSnapshot-oriented crawler that preserves entire sites for the Wayback Machine.
Your task: read the problem above, then write what the system is, who uses it, the rough scale, and the headline UX expectation — in your own words. Submit for AI review when you're ready.
Click any step in the sidebar to jump around — sections don't have to be done in order. Press ? any time to see all shortcuts.