Design Google Docs (Real-time Collaboration)
OT or CRDT-based concurrent editing, presence, comments, version history, with sub-50 ms keystroke echo and offline support.
Intro
Google Docs is the canonical 'real-time collaboration' problem: many users editing the same document concurrently, with sub-50 ms keystroke echo, presence indicators, comments, and offline mode. The decisive choice is the concurrency control algorithm — Operational Transformation (OT, what Docs uses) or CRDTs (what Notion / Figma use). Each has architectural consequences.
Functional
- Concurrent edits by N users on a single document.
- Presence — see who else is editing + their cursor.
- Comments + suggestions threaded inline.
- Version history — restore to any point.
Non-functional
- Local keystroke echo p99 < 50 ms.
- Operation propagation p95 < 200 ms.
- Convergence — all clients reach same final state.
- Offline edits merged without loss when reconnected.
Components
Document service
OT/CRDT engine; broadcast hub.
Op store
Append-only log of operations per doc.
Snapshot store
Periodic full snapshots for fast loads.
Presence service
Live cursor + selection per user.
Comment service
Threaded inline annotations.
Sync gateway
WebSocket per editor session.
Trade-offs
OT vs CRDT
Pros
- OT: smaller payloads, well-understood.
- CRDT: offline-friendly, mathematically convergent.
Cons
- OT: requires central server for transform.
- CRDT: larger metadata per character.
Per-document central server vs leaderless
Pros
- Central: simpler, OT works.
- Leaderless: better availability.
Cons
- Central: SPOF per doc.
- Leaderless: needs CRDT, larger state.
Scale concerns
- Hot doc — 100+ concurrent editors on one doc.
- Op log growth — periodic snapshots.
- Offline reconciliation — large pending op queues.