Skip to content
Jarviix
HLD11 min read

Design Google Docs (Real-time Collaboration)

OT or CRDT-based concurrent editing, presence, comments, version history, with sub-50 ms keystroke echo and offline support.

hldsystem-designcollaboration

Intro

Google Docs is the canonical 'real-time collaboration' problem: many users editing the same document concurrently, with sub-50 ms keystroke echo, presence indicators, comments, and offline mode. The decisive choice is the concurrency control algorithm — Operational Transformation (OT, what Docs uses) or CRDTs (what Notion / Figma use). Each has architectural consequences.

Functional

  • Concurrent edits by N users on a single document.
  • Presence — see who else is editing + their cursor.
  • Comments + suggestions threaded inline.
  • Version history — restore to any point.

Non-functional

  • Local keystroke echo p99 < 50 ms.
  • Operation propagation p95 < 200 ms.
  • Convergence — all clients reach same final state.
  • Offline edits merged without loss when reconnected.

Components

  • Document service

    OT/CRDT engine; broadcast hub.

  • Op store

    Append-only log of operations per doc.

  • Snapshot store

    Periodic full snapshots for fast loads.

  • Presence service

    Live cursor + selection per user.

  • Comment service

    Threaded inline annotations.

  • Sync gateway

    WebSocket per editor session.

Trade-offs

OT vs CRDT

Pros

  • OT: smaller payloads, well-understood.
  • CRDT: offline-friendly, mathematically convergent.

Cons

  • OT: requires central server for transform.
  • CRDT: larger metadata per character.

Per-document central server vs leaderless

Pros

  • Central: simpler, OT works.
  • Leaderless: better availability.

Cons

  • Central: SPOF per doc.
  • Leaderless: needs CRDT, larger state.

Scale concerns

  • Hot doc — 100+ concurrent editors on one doc.
  • Op log growth — periodic snapshots.
  • Offline reconciliation — large pending op queues.

Related reads