Skip to content
Jarviix
HLD12 min read

Design Google Drive / Dropbox

Block-level deduplication, delta sync, conflict resolution, and a metadata model that scales to billions of files.

hldsystem-designstorage

Intro

Google Drive / Dropbox stores user files in the cloud and keeps them in sync across N devices. The hard parts: (1) bandwidth-efficient sync (only push the changed bytes), (2) atomic conflict resolution when two clients edit offline, (3) metadata that scales to ~100 B objects without resharding the world.

Functional

  • Upload / download files via web + desktop client.
  • Sync changes across all logged-in devices in seconds.
  • Share files / folders with read or read-write access.
  • Versioning + 30-day undelete.

Non-functional

  • Upload p95 < 2 s for files ≤ 10 MB.
  • Sync convergence p95 < 5 s across devices.
  • Durability ≥ 11 9s on file bytes.
  • Storage efficiency: dedupe identical 4 MB blocks across users.

Components

  • Block service

    Stores 4 MB content-addressed chunks in object storage.

  • Metadata service

    Per-user file tree, version chain, share ACLs.

  • Sync notifier

    WebSocket / long-poll push to connected clients.

  • Client agent

    Watches local FS, computes diffs, retries on failure.

Trade-offs

Whole-file replace vs. block-level delta

Pros

  • Whole-file is simpler.
  • Block-level cuts bandwidth ~10× for small edits in big files.

Cons

  • Whole-file blows up on multi-GB videos.
  • Block-level needs a content-addressed store + Merkle tree.

Scale concerns

  • Hot-share thundering herd — 10 k clients pulling the same shared file.
  • Metadata sharding by user_id keeps trees co-located but skews on heavy users.
  • Conflict storms when a poorly-synced client comes online with stale state.

Related reads