Tech · 8 min read
System Design Basics: The Building Blocks Every Backend Engineer Should Know
A practical, no-fluff tour of the building blocks behind every large system — load balancers, caches, databases, queues, CDNs — and the trade-offs that decide how to wire them together.
By Jarviix Engineering · Apr 18, 2026
System design isn't a separate subject from "the rest of programming". It's what programming becomes when one box stops being enough. The vocabulary feels intimidating, but the building blocks are surprisingly small. Once you know what each one does well — and where it bites — you can wire any system on a whiteboard.
This post walks through those building blocks the way a senior engineer thinks about them: not as a feature list, but as a series of trade-offs. We'll keep it concrete. By the end you should be able to read a "design X" interview prompt and know which boxes go where, and why.
Why "design" looks scary at first
A normal app on one server is easy to reason about: requests come in, hit the database, go out. Latency is a function of one machine's CPU and one disk's seek time. It either works or you reboot it.
Once your app has to scale, you start adding boxes — and every box you add changes what can fail, what can be inconsistent, and how long any given user has to wait. System design is mostly the discipline of adding the smallest number of boxes that solves the problem, while staying honest about the new failure modes you just introduced.
That framing matters because it's how interviewers actually evaluate you. They don't want a list of cool components. They want to see you choose, justify, and back off.
The cast of characters
Here's the short version of the toolkit. We'll go deeper on each one in a moment.
- Load balancer. Spreads incoming traffic across multiple app servers. The simplest first box you add when one server isn't enough.
- App / web servers. Stateless if you can manage it. Statelessness is what lets you horizontally scale.
- Database. Almost always the hardest part. Pick relational unless you have a strong reason not to.
- Cache. A faster, smaller copy of the slow thing. Wonderful when used well; nightmare when invalidated wrong.
- Message queue / log. Decouples producers from consumers. Lets bursty work smooth out and lets retries happen without losing data.
- Object storage. S3-style storage for blobs — images, videos, exports. Cheap, durable, slow per-request.
- CDN. Pushes copies of static (or near-static) content close to users. Drops latency from "feel it" to "imperceptible".
- Search index. A specialized data store optimized for "find the right rows", not "give me row 7".
The interesting question is never "do I know what these are?" — it's "in what order do I add them, and what do I push back on?".
Load balancers and the stateless app tier
The first thing that tends to break is your single app server. The fix is usually:
- Put the app behind a load balancer.
- Run multiple identical copies of the app.
- Make the app stateless — no in-memory session, no on-disk uploads — so any request can hit any server.
Why stateless matters. As soon as a request has to come back to the same machine (because that's where the user's session lives), you've lost the ability to scale horizontally without sticky-routing tricks, and your blast radius for one machine dying is no longer "one request" but "every request from those users".
Sessions move to Redis or a JWT. Uploads stream straight to object storage. The app servers become cattle, not pets — boring on purpose.
Databases: pick boring, scale carefully
Databases are where most "designs" silently die.
The default should be a relational database (Postgres, MySQL, Aurora). Strong schemas, transactions, and a query language that survives every framework. You only leave for a NoSQL store when you have a clear pattern that the relational model handles badly — extremely high write throughput on simple keys, deeply nested document shapes, or graph-like relationships.
When the relational box itself becomes the bottleneck, you have a small staircase to climb:
- Vertical scale. A bigger machine. Cheap and stupid; usually buys a year.
- Read replicas. Send reads to followers, writes to a primary. Easy win for read-heavy systems. Watch out for replication lag — your "just-written" row may not be on the replica yet.
- Sharding. Partition rows across N machines, by user id, tenant id, or hash. Painful to introduce later. Painful to re-shard. You almost never want this until you've profiled hard.
- Switch engines. Move write-heavy timeseries to Cassandra/DynamoDB; move analytical queries to a column store like Snowflake/BigQuery; keep the relational store for transactional truth.
The interview move here is: don't shard on the first whiteboard. Earn the shard. Most "scale" prompts can be solved with reads + a cache.
Caches: the most powerful, most dangerous tool
Caches are the single biggest "leverage" item in your toolbox. A 1ms Redis lookup replacing a 50ms database query means you just got 50× headroom on that path. The catch is that the bug is never in the read; the bug is always in the invalidate.
Three patterns you'll keep meeting:
- Cache-aside (lazy). App reads from the cache; on miss, reads from the DB and writes back. Simple, robust. Stale data on writes unless you also invalidate.
- Write-through. Every DB write also writes the cache. Avoids stale reads. Costs you on writes.
- Write-back. App writes to the cache only; cache flushes to the DB asynchronously. Fastest, but you can lose data on a cache crash.
Choose by where the pain is. Read-heavy and stale-tolerant? Cache-aside. Read-heavy and cannot be stale? Write-through. Write-heavy and you can absorb risk? Write-back, with eyes wide open.
A useful mental rule: cache the hot, slow, and rarely-changing. Resist caching anything else until profiling proves it pays for the extra failure mode.
Queues: smoothing bursts and decoupling failure
Queues (Kafka, SQS, RabbitMQ, Pub/Sub) are how you stop calling slow things synchronously. The pattern is everywhere:
- A user uploads a video → a queue entry → a worker transcodes it → a callback updates the row.
- An order is placed → a queue entry → fraud checks, inventory reservation, email confirmation all run as independent consumers.
- A batch import lands → a queue full of work items → a fleet of workers grinds through them.
The wins are obvious: bursty traffic gets buffered, slow downstreams stop blocking the request path, and retries happen without the user noticing. The cost is two new failure domains (the queue itself, and the consumer group) plus a new mental model — your data is now eventually consistent across hops.
A clean interview move: explicitly call out the at-least-once semantics. "I'll make the consumer idempotent so retries are safe" is something most candidates forget to say.
CDNs and the latency math
The speed of light is a real number. A request from Mumbai to Virginia can't beat ~120ms one way. If your assets are served from one US region, every Indian user pays that toll on every load.
CDNs (CloudFront, Cloudflare, Fastly) cache copies of your static content at the edge — physically close to your users. Your images, JS bundles, fonts, even cache-friendly API responses can come from a city 5ms away instead of a region 200ms away.
Three-line rule of thumb:
- Static assets → always CDN.
- Authenticated dynamic responses → usually not.
- Heavily-read public API responses with short TTL → CDN, with care.
Putting it together: a simple read-heavy product
To make this less abstract, here's how the boxes wire up for a generic content product (blog, news site, e-commerce listing) at modest scale:
[ Users ]
|
v
[ CDN ] --- static assets, images
|
v
[ Load Balancer ]
|
+-----> [ App Server 1 ]
+-----> [ App Server 2 ] ----> [ Cache (Redis) ]
+-----> [ App Server N ] |
| v
+--------------> [ Primary DB ]
|
v
[ Read replicas ]
Read-heavy traffic hits the CDN for assets, the cache for repeat queries, the read replicas for everything else. Writes go to the primary; replication carries them out. When a particular write goes through, the app explicitly invalidates the relevant cache key. That's it. One LB, one cache layer, one DB tier with a few followers, and you can serve a serious amount of traffic.
When this stops being enough, you climb the staircase: shard the DB, split workloads (search index, analytical store, queue + workers), and isolate critical paths from everything else.
What an interviewer is actually checking
If you take one thing away, take this: a good system design answer is mostly subtraction. Anyone can sprinkle Kafka and Redis on a whiteboard. The signal is in:
- "I'd start with the smallest thing that works, which is..."
- "Before I add X, the cheaper move is..."
- "X comes with these new failure modes; I'd manage them by..."
- "Here's what would force me to revisit this."
The components are the easy part. The judgment about when to use them is the job.
If you want to drill this further, the DSA hub and the HLD writeups walk through specific systems — URL shorteners, rate limiters, news feeds — using exactly this style of reasoning.
Frequently asked questions
Do I need to memorize numbers like 'a single Redis node does 100k ops/s'?
Memorize a tight set: disk seek vs RAM vs network round-trip, modern SSD throughput, and a few rough QPS budgets per common component. The rest you should be able to derive.
When should I reach for a queue vs synchronous calls?
Reach for a queue when the work is asynchronous by nature, when downstream is slow or unreliable, or when traffic is bursty and you need a buffer. Don't add a queue just because it sounds scalable — every queue is also a place outages can hide.
Caches sound free. Why not cache everything?
Caching adds invalidation logic, an extra failure domain, and consistency questions. Cache the hot, slow, and rarely-changing things; resist caching the rest until profiling proves you need to.
Read next
Apr 19, 2026 · 6 min read
Designing Rate Limiters: Token Bucket, Leaky Bucket, and Sliding Windows
How rate limiters actually work — token bucket, leaky bucket, fixed and sliding windows — with the trade-offs that decide which one belongs in front of your API.
Apr 19, 2026 · 7 min read
Designing a URL Shortener: An Interview-Style Walkthrough
A complete walkthrough of designing a URL shortener at interview depth — requirements, ID generation, storage, caching, scaling, and the trade-offs at every step.
Apr 19, 2026 · 6 min read
Load Balancing Strategies: L4 vs L7, Round Robin, and What 'Sticky Sessions' Really Cost
How load balancers actually distribute traffic — L4 vs L7, the algorithms that matter (round robin, least connections, consistent hashing), and the hidden costs of sticky sessions.