HLD9 min read
Design Instagram (Photo feed)
Object storage for media, CDN-fronted feed, hybrid fan-out timeline.
hldsystem-design
Intro
Instagram looks like Twitter with photos, but the bandwidth profile is different — a single image is ~200 KB, video much more. The architecture is dominated by storage tiering and CDN strategy.
Functional
- Upload photo / video with caption.
- Feed: photos from people you follow, ranked.
- Profile: your own posts.
- Like / comment / DM (out of scope here).
Non-functional
- Hot photos served at p95 < 100 ms via CDN.
- 1 B users, 500 M DAU. ~100 M photos/day at 200 KB = ~20 TB/day raw.
- 5 yrs storage with copies = ~150 PB.
Components
API gateway
Auth, rate-limit, request routing.
Upload service
Multi-part upload to object store; emits processing job.
Media processor
Generates thumbnails + transcodes video.
Object store
S3 / GCS / blob.
CDN
Edge caches photos. > 95% cache hit.
Feed service
Hybrid fan-out (push for normal, pull for celebrities).
Ranking model
Lightweight ranker on candidates from feed-store.
Trade-offs
Pre-generated thumbnails vs. on-the-fly
Pros
- Pre-gen → cheap reads.
- On-the-fly → no storage waste for unviewed images.
Cons
- Pre-gen multiplies storage by ~3×.
- On-the-fly needs an image proxy fleet.
Scale concerns
- Origin shielding to prevent CDN miss storms.
- Cold storage tier (S3 IA / Glacier) for old media.
- Feed staleness — rank locally to balance freshness vs. compute.