HLD12 min read
Design Google Drive / Dropbox
Block-level deduplication, delta sync, conflict resolution, and a metadata model that scales to billions of files.
hldsystem-designstorage
Intro
Google Drive / Dropbox stores user files in the cloud and keeps them in sync across N devices. The hard parts: (1) bandwidth-efficient sync (only push the changed bytes), (2) atomic conflict resolution when two clients edit offline, (3) metadata that scales to ~100 B objects without resharding the world.
Functional
- Upload / download files via web + desktop client.
- Sync changes across all logged-in devices in seconds.
- Share files / folders with read or read-write access.
- Versioning + 30-day undelete.
Non-functional
- Upload p95 < 2 s for files ≤ 10 MB.
- Sync convergence p95 < 5 s across devices.
- Durability ≥ 11 9s on file bytes.
- Storage efficiency: dedupe identical 4 MB blocks across users.
Components
Block service
Stores 4 MB content-addressed chunks in object storage.
Metadata service
Per-user file tree, version chain, share ACLs.
Sync notifier
WebSocket / long-poll push to connected clients.
Client agent
Watches local FS, computes diffs, retries on failure.
Trade-offs
Whole-file replace vs. block-level delta
Pros
- Whole-file is simpler.
- Block-level cuts bandwidth ~10× for small edits in big files.
Cons
- Whole-file blows up on multi-GB videos.
- Block-level needs a content-addressed store + Merkle tree.
Scale concerns
- Hot-share thundering herd — 10 k clients pulling the same shared file.
- Metadata sharding by user_id keeps trees co-located but skews on heavy users.
- Conflict storms when a poorly-synced client comes online with stale state.