Design Pastebin (Code-paste / Snippet sharing)
URL-shortener-class read path with text bodies in object storage, expiry, syntax-highlight, and abuse controls.
Intro
Pastebin lets users paste a code snippet or text and get a short URL to share it. Architecturally it's a URL shortener with the long URL replaced by an arbitrary text blob (≤ ~10 MB). The interesting parts are body storage (object store keyed by paste id), syntax-highlight rendering, expiry / privacy semantics, and abuse controls (malware, doxing, leaked secrets).
Functional
- POST /paste { text } → short id.
- GET /{paste_id} → render with syntax highlighting.
- Optional expiry (10 min, 1 day, 30 days, never).
- Privacy levels: public, unlisted, password-protected.
Non-functional
- Read p95 < 100 ms on hot pastes (CDN-cacheable).
- Write p95 < 500 ms.
- Storage: avg paste 50 KB, top 1% > 1 MB.
- Durability ≥ 11 9s on bodies.
Components
Write API
Validates + persists the body to object storage.
Read API
Resolves id → body URI; returns body or rendered HTML.
Object store
Bodies keyed by paste_id.
Metadata DB
Postgres for paste meta (lang, expiry, ACL).
Renderer
Server-side syntax highlight + paste-of-the-day cache.
Abuse pipeline
Async secret-scan + safe-browsing-style classification.
Trade-offs
Render server-side vs. client-side
Pros
- Server-side: cacheable HTML, accessible by curl/wget.
- Client-side: cheaper compute.
Cons
- Server-side: one rendered version per (paste, theme).
- Client-side: needs JS, breaks scrapers.
Scale concerns
- Hot pastes (Hacker-News-front-page) need CDN absorption.
- Body size variance — must enforce a hard cap.
- Abuse: phishing / leaked credentials / malware go viral.