Skip to content
Jarviix
HLD11 min read

Design a Distributed Message Queue (Kafka-class)

Partitioned, replicated, append-only log with at-least-once delivery, ordered partitions, and consumer groups at 1M+ msgs/sec.

hldsystem-designmessaging

Intro

A Kafka-class message queue is the backbone of every event-driven architecture. The design is dominated by partitioning (for scale), replication (for durability), and consumer groups (for ordered, sharded consumption). Most candidates over-think delivery semantics — the right answer is at-least-once with idempotent consumers.

Functional

  • Producer publishes messages to a topic.
  • Consumer reads messages with replay support.
  • Consumer groups for parallel processing with partition rebalancing.
  • Topic creation + partition / retention configuration.

Non-functional

  • Throughput ≥ 1 M msgs/sec per cluster.
  • Producer p99 publish < 10 ms.
  • Durability: replication factor 3, sync to majority.
  • Ordered within partition; no global order.

Components

  • Brokers

    Hold partitioned logs + replicate.

  • Topic metadata

    Partitions, leaders, ISR — coordinated via ZooKeeper / KRaft.

  • Producer client

    Batches + sends to leader of each partition.

  • Consumer client

    Reads with offsets per partition.

  • Consumer group coordinator

    Manages partition assignment + rebalance.

  • Schema registry

    Avro / Protobuf schema versioning.

Trade-offs

At-least-once vs exactly-once

Pros

  • At-least-once: simple, scales.
  • Exactly-once: requires producer idempotency + transactions.

Cons

  • At-least-once: consumers must dedupe.
  • Exactly-once: throughput penalty.

Push vs pull consumers

Pros

  • Pull: consumer self-throttles, simpler.

Cons

  • Pull: poll overhead.

Scale concerns

  • Hot partition (one key dominating).
  • Consumer rebalance storms.
  • Disk I/O bottleneck — sequential writes critical.
  • Cross-region replication.

Related reads