HLD11 min read

Design a Distributed Message Queue (Kafka-class)

Partitioned, replicated, append-only log with at-least-once delivery, ordered partitions, and consumer groups at 1M+ msgs/sec.

hldsystem-designmessaging

Intro

A Kafka-class message queue is the backbone of every event-driven architecture. The design is dominated by partitioning (for scale), replication (for durability), and consumer groups (for ordered, sharded consumption). Most candidates over-think delivery semantics — the right answer is at-least-once with idempotent consumers.

Functional

Producer publishes messages to a topic.
Consumer reads messages with replay support.
Consumer groups for parallel processing with partition rebalancing.
Topic creation + partition / retention configuration.

Non-functional

Throughput ≥ 1 M msgs/sec per cluster.
Producer p99 publish < 10 ms.
Durability: replication factor 3, sync to majority.
Ordered within partition; no global order.

Components

Brokers
Hold partitioned logs + replicate.
Topic metadata
Partitions, leaders, ISR — coordinated via ZooKeeper / KRaft.
Producer client
Batches + sends to leader of each partition.
Consumer client
Reads with offsets per partition.
Consumer group coordinator
Manages partition assignment + rebalance.
Schema registry
Avro / Protobuf schema versioning.

Trade-offs

At-least-once vs exactly-once

Pros

At-least-once: simple, scales.
Exactly-once: requires producer idempotency + transactions.

Cons

At-least-once: consumers must dedupe.
Exactly-once: throughput penalty.

Push vs pull consumers