Tech · 6 min read
Kafka Explained Simply: Topics, Partitions, Consumers, and the Mental Model That Makes It Click
Kafka isn't a queue — it's a distributed log. Once you internalize that one shift, the partitions, consumer groups, offsets, and replay semantics all start to make sense.
By Jarviix Engineering · Apr 19, 2026
Kafka is one of those technologies that confuses developers because it looks like a message queue but behaves like a database log. Once you internalize that one mental-model shift, every other piece — partitions, consumer groups, offsets, replay — falls into place.
This post is a calm walk through Kafka's core concepts and the patterns where it earns its place in production stacks.
The mental model: Kafka is a distributed log
A Kafka topic is, fundamentally, an append-only log file (or many of them — see partitions). Producers append messages to the end. Consumers read from positions in the log called offsets. The log is durable — messages stay around even after they've been read, until retention policy says otherwise.
This is a different shape from a traditional queue:
| Traditional queue (SQS, RabbitMQ) | Kafka | |
|---|---|---|
| Lifetime | Message gone after consumed | Message stays until retention expires |
| Replay | Hard or impossible | Trivial — rewind offset |
| Multiple consumers | Each gets a copy or splits the work | Each consumer group reads independently |
| Ordering | Per-queue, often weak | Strict per-partition |
The "log, not queue" perspective explains almost every Kafka design choice.
Topics and partitions
A topic is a logical stream — orders, user-events, clicks.
A topic is split into partitions for parallelism. Each partition is its own ordered log. Messages within a partition are strictly ordered; messages across partitions are not.
topic: orders
├─ partition 0: [msg, msg, msg, ...]
├─ partition 1: [msg, msg, msg, ...]
└─ partition 2: [msg, msg, msg, ...]
When you produce a message, you (or Kafka) pick which partition it goes to. The standard rule:
- No key: round-robin across partitions.
- With key: hash(key) % partitions. Same key always goes to the same partition.
Consequence: if you key by user_id, all events for user 42 land in the same partition and are processed in order. This is the foundation of "ordering where it matters" in Kafka.
Consumer groups
A consumer group is one or more consumers cooperating to process a topic.
- Each partition is owned by exactly one consumer in a group.
- Each consumer can own multiple partitions.
- Consumers in different groups read independently — each group has its own offsets.
topic: orders (3 partitions)
group "checkout-processor" (2 consumers)
consumer A → owns partitions 0, 1
consumer B → owns partition 2
group "analytics" (1 consumer)
consumer C → owns partitions 0, 1, 2
This gives you two superpowers:
- Horizontal scaling within a group. Add a consumer; partitions get redistributed; throughput goes up.
- Multiple independent consumers of the same topic. Analytics, billing, audit logs all read the same events without affecting each other.
The cap on parallelism within a group is the partition count. 12 partitions = up to 12 consumers in a group; the 13th sits idle.
Offsets
Each consumer group tracks where it is in each partition with an offset — the index of the next message to read.
partition 0: [m0, m1, m2, m3, m4, m5, ...]
^
group "checkout" offset = 4
^
group "analytics" offset = 3
Offsets are stored in Kafka itself (in a special internal topic __consumer_offsets). When a consumer commits an offset, it's saying "I've durably processed up to here; if I crash, restart me from this point."
Two flavors:
- Auto-commit (default in many clients). Offsets are committed every few seconds. Easy, but you can replay or skip messages on crash.
- Manual commit. Explicitly commit after processing. More control, more code.
For most production work, manual commit after successful processing is the right pattern.
Replication and durability
Each partition has multiple replicas — one leader and several followers. Producers write to the leader; followers replicate.
The producer can ask for various ack semantics:
- acks=0: fire and forget. Fast, lossy.
- acks=1: wait for the leader. OK, but data lost if leader fails before replication.
- acks=all: wait for all in-sync replicas. Durable; pay latency.
For anything you actually care about, acks=all plus min.insync.replicas=2 is the safe default.
Retention
Messages don't disappear after being read. They live until retention says they should go away. Two policies:
Time/size based
log.retention.hours=168 # 7 days
log.retention.bytes=1073741824 # 1 GiB per partition
After the limit, old segments are deleted. Pure event streams (clicks, telemetry) usually use this.
Compacted
For each key, keep only the latest value. The topic becomes a "current state" snapshot — perfect for caching last-known values, replicating reference data, building changelog-driven materialized views.
key=user-42, value=v1 ← will be removed
key=user-43, value=v1
key=user-42, value=v2 ← latest, kept
You can replay a compacted topic from the beginning and get the latest value of every key — the foundation of Kafka Streams, Debezium CDC, and many event-sourcing patterns.
When Kafka is the right tool
Kafka shines in:
- High-throughput event streams. Hundreds of thousands of messages per second sustainably, on commodity hardware.
- Decoupling producers and consumers. Many producers, many consumers, varied speeds, none coordinating.
- Replay. "Reprocess all of last week's data through the new model." Trivial in Kafka, hard in queues.
- Event sourcing / CDC. The log is the source of truth; everything else is derived state.
- Stream processing. Kafka Streams, Flink, Spark Streaming — all built on Kafka topics as the substrate.
When Kafka is overkill:
- Simple background work. Use Redis Streams, RabbitMQ, or SQS. Kafka has operational weight that small workloads don't need.
- Request/response. Kafka is one-way. Don't try to build RPC on top of it.
- Strict global ordering. Per-partition ordering is easy; total ordering across a topic costs you all parallelism.
The patterns that matter
A few recurring patterns:
Idempotent consumers
At-least-once delivery is the default. Same message can be delivered twice (after a consumer crash, after a rebalance). Your consumer must process duplicates safely. (See idempotency in APIs.)
Consumer lag as the SLO
The single most important Kafka metric: consumer lag — the offset gap between latest produced message and last consumed offset, per partition. Growing lag = your consumers can't keep up. Set alerts. Plan capacity around it.
The transactional outbox
Don't do db.write() then kafka.produce() separately — they're not atomic. Use the outbox pattern: write the event to a DB table in the same transaction as the business state, then publish from the outbox. (Covered in detail in event-driven architecture.)
Schema registry
Treat events as a public contract. Use Avro, Protobuf, or JSON Schema with a registry. Validate on produce; consumers reject invalid messages. Stops "we changed the event shape and broke five downstream services" incidents.
Three rules
- Pick partition count carefully. Repartitioning a live topic is operationally painful. Over-provision modestly (room to add consumers), don't go crazy.
- Manual commit, after successful processing. Auto-commit is "fire and pray". Manual gives you the right semantics with a few extra lines.
- Monitor consumer lag like uptime. It's the single most actionable Kafka health metric. Page on sustained lag growth.
What to read next
Kafka is the substrate for most modern event-driven systems. Event-driven architecture covers the patterns built on top; microservices observability covers how to debug them in production. The Twitter timeline HLD writeup is the canonical example of using Kafka-style partitioned logs for fanout at scale — a great applied read once the concepts here have clicked.
Frequently asked questions
Is Kafka the right choice for simple background jobs?
Usually no. Kafka shines for high-throughput streams, event sourcing, and replayable logs. For ordinary background work queues, RabbitMQ, SQS, or Redis Streams are simpler.
How many partitions do I need?
At least as many as your maximum desired consumer parallelism. More partitions = more parallelism but also more rebalance overhead and more files on disk. Start with 12-24 for most workloads and adjust based on observed lag.
Can I delete data from Kafka?
Yes — by retention policy (time or size based) or via compaction (keep only the latest value per key). You don't typically delete individual messages. Retention is the design choice you make per topic.
Read next
Apr 19, 2026 · 6 min read
Event-Driven Architecture: When Events Pay Off (and When They Don't)
What event-driven really means, the patterns that work — events vs commands, choreography vs orchestration, sagas, outbox — and the failure modes nobody warns you about.
Apr 19, 2026 · 6 min read
Eventual Consistency: What It Really Means in Production
What eventual consistency actually buys you, what it costs your users, and the patterns (read-your-writes, monotonic reads, quorum reads) that make it bearable.
Apr 19, 2026 · 6 min read
Idempotency in APIs: Why It Matters and How to Actually Implement It
Networks retry. Idempotency is what keeps a single user click from creating two charges. A practical guide to designing idempotent APIs without painting yourself into a corner.