Interview & System Design

Apache Kafka
Beginner to Pro

A complete, visual guide — core concepts, internals, partitioning, ISR, system design patterns, and a Spring Boot E2E walkthrough.

0 of 7 sections visited
What is Apache Kafka?

Start from zero. Understand the problem Kafka solves before touching any code.

🤔

The Problem Kafka Solves

Imagine you have 10 services. Each service needs to send data to 5 other services. That's 50 direct connections — a maintenance nightmare. Every new service multiplies the complexity.

Kafka introduces a central, durable event log that all services write to and read from. Instead of 50 connections, you have N + M connections (producers + consumers).

Real analogy: Think of Kafka like a newspaper printing press. Writers (producers) submit articles once. The press (Kafka) stores and prints them. Millions of readers (consumers) pick up their copy independently — at their own pace, from any point in time.
Before vs After Kafka architecture WITHOUT KAFKA WITH KAFKA Svc A Svc B Svc C Svc D Svc E Tightly coupled, fragile Apache Kafka Prod A Prod B Prod C Cons A Cons B Cons C Decoupled, scalable, durable

When Should You Use Kafka?

✅ Use Kafka when:
• High-throughput event streaming (millions/sec)
• Decoupling microservices
• Event sourcing / audit logs
• Real-time analytics pipelines
• Multi-consumer fanout
❌ Don't use Kafka when:
• Simple job queues (use RabbitMQ)
• Request/reply RPC patterns
• Very low message volume (<1k/min)
• You need complex routing logic
• Team can't manage infrastructure

Kafka at Scale — Real Numbers

1M+msgs/sec per broker
7 daysdefault retention
msend-to-end latency
Who uses it? LinkedIn (where Kafka was born), Netflix, Uber, Airbnb, Twitter, and thousands more. LinkedIn processes over 7 trillion messages per day on Kafka.
Core Concepts

The building blocks you must understand cold before any interview.

Kafka core concepts diagram Producer Publishes events BROKER (Topic: orders) Partition 0 0 1 2 3→ Partition 1 0 1 2→ Partition 2 0 1→ Consumer Group Consumer 1 Consumer 2 Consumer 3 offset 3 offset 2 offset 1
📋

Topic

A Topic is a named, ordered, immutable log of events. Think of it as a database table — but append-only and time-ordered.

Events in a topic are never deleted on consumption. Multiple consumers read the same events independently. Retention is time or size based.
// Creating a topic programmatically in Java
AdminClient admin = AdminClient.create(props);
NewTopic topic = new NewTopic(
    "orders",   // topic name
    3,          // partitions
    (short) 2   // replication factor
);
admin.createTopics(List.of(topic));
🔢

Offset

An Offset is a unique, sequential ID for each message within a partition. Offsets are per-partition — not global across the topic.

Consumers commit their offset to track progress. If a consumer crashes, it resumes from the last committed offset — no data loss.
// Manually committing offset after processing
@KafkaListener(topics = "orders")
public void consume(ConsumerRecord<String, String> record,
                    Acknowledgment ack) {
    processOrder(record.value());
    ack.acknowledge(); // commit offset
}
🖥️

Broker

A Broker is a single Kafka server. It stores partitions, handles reads/writes, and replicates data to other brokers. A Kafka cluster is a group of brokers.

Production setups use minimum 3 brokers for fault tolerance. KRaft mode (Kafka 3.x+) removes ZooKeeper dependency entirely.
# broker.properties key settings
broker.id=0
log.dirs=/var/kafka/logs
num.partitions=3
default.replication.factor=3
min.insync.replicas=2
👥

Consumer Group

A Consumer Group is a set of consumers sharing work. Each partition is assigned to exactly ONE consumer in the group at a time — ensuring ordered, parallel processing.

If you have 4 partitions and 2 consumers, each consumer handles 2 partitions. Adding a 5th consumer? It sits idle — you can't exceed partition count!
Properties props = new Properties();
props.put("group.id", "order-processing-group");
props.put("auto.offset.reset", "earliest");
// All consumers with same group.id share work
Partitions & Offsets — Deep Dive

This is the most important concept for scaling Kafka. Master it.

How Partitioning Works

When a producer sends a message, Kafka decides which partition it goes to using a partitioning strategy. This is crucial for both ordering guarantees and load distribution.

Key-based partitioning: partition = hash(key) % numPartitions. Same key ALWAYS goes to same partition — guarantees ordering per key. Use for: user events (key=userId), order events (key=orderId).
ProducerRecord<String, String> record =
    new ProducerRecord<>(
        "orders",         // topic
        "user-123",       // KEY → same partition always!
        orderJson         // value
    );
producer.send(record);
Round-robin (no key): Messages distributed evenly across partitions. No ordering guarantees. Use for: metrics, logs where global order doesn't matter.
// No key = round-robin distribution
ProducerRecord<String, String> record =
    new ProducerRecord<>(
        "metrics",        // topic
        null,             // no key!
        metricJson        // value
    );
producer.send(record);
Custom Partitioner: Implement logic yourself. Use for: geographic routing, hot-key avoidance, business-specific distribution rules.
public class RegionPartitioner
    implements Partitioner {
  @Override
  public int partition(String topic, Object key,
      byte[] keyBytes, Object value,
      byte[] valueBytes, Cluster cluster) {
    String region = key.toString();
    return switch (region) {
      case "US" -> 0;
      case "EU" -> 1;
      default -> 2;
    };
  }
}

Interactive Partition Explorer

Click a message block to see its metadata. See how keys determine partition assignment.

Partition 0
Partition 1
Partition 2
👆 Click a message block above to inspect it

How Many Partitions Should You Use?

This is a common interview question. There's no magic formula, but here's the mental model:

Rule of thumb: target throughput ÷ throughput per partition
partitions = max(throughput_needed / single_partition_throughput, consumers_needed)
1

Start with consumer count

You can't have more active consumers in a group than partitions. Plan for your max parallelism first.

2

Consider throughput

Each partition handles ~10-100 MB/s depending on message size and broker. If you need 1 GB/s, you need at least 10–100 partitions.

3

Don't over-partition

Each partition costs memory, file handles, and replication overhead. More isn't always better. Over-partitioning increases leader election time.

⚠️ Interview trap: "Can I reduce partitions?" — NO! You can only increase partitions, never decrease. Decreasing would break key-based routing. Plan ahead.
Kafka Internals — How It Really Works

Replication, ISR, leader election, log compaction — the internals that make Kafka reliable.

Replication — Never Lose Data

Every partition has one leader and zero or more followers. Producers and consumers always talk to the leader. Followers replicate data from the leader continuously.

🟣 Broker 0 — LEADER
offset 0 ✓
offset 1 ✓
offset 2 ✓
offset 3 ✓ (latest)
Broker 1 — Follower
offset 0 ✓
offset 1 ✓
offset 2 ✓
offset 3 ⏳
Broker 2 — Follower
offset 0 ✓
offset 1 ✓
offset 2 ✓
offset 3 ✓
Cluster healthy. Broker 0 is leader. RF=3, ISR=[0,1,2]

ISR — In-Sync Replicas (Critical Concept!)

The ISR (In-Sync Replica Set) is the list of replicas that are fully caught up with the leader. This is Kafka's durability guarantee mechanism.

acks=all (or acks=-1): Producer waits for ALL ISR members to acknowledge. This is the safest setting — no data loss even if the leader crashes immediately after.

min.insync.replicas=2: Kafka refuses to accept writes if fewer than 2 replicas are in sync. Prevents "zombie leader" data loss.
// Producer durability settings
props.put("acks", "all");              // wait for all ISR
props.put("retries", Integer.MAX_VALUE); // retry forever
props.put("enable.idempotence", "true");  // exactly-once
props.put("max.in.flight.requests.per.connection", "5");
acks settingDurabilityLatencyUse case
acks=0Fire & forgetLowestMetrics, logs (OK to lose)
acks=1Leader onlyMediumDefault — balanced
acks=allAll ISRHigherFinancial, orders, critical

Log Compaction — Keep Only Latest per Key

Kafka's default retention policy deletes old messages by time or size. But log compaction is different — it keeps the latest value per key, deleting old versions.

Log compaction before and after BEFORE COMPACTION AFTER COMPACTION user-1 alice user-2 bob user-1 alice2 user-3 carol user-1 alice3 3 entries for user-1 (only last matters) user-2 bob user-3 carol user-1 alice3 Latest value per key is preserved
When to use log compaction: User profile state, inventory levels, configuration data — any "current state" topic where you only care about the latest value. Enable with cleanup.policy=compact

KRaft Mode (Kafka 3.x+) — No More ZooKeeper

Before Kafka 3.0, ZooKeeper managed cluster metadata (leader election, broker registry). KRaft (Kafka Raft) replaces this with a built-in consensus mechanism — fewer moving parts, faster leader election, simpler operations.

KRaft is production-ready as of Kafka 3.3+ and is the default in Kafka 4.0. New projects should use KRaft mode.
Old (ZooKeeper): Separate ZooKeeper cluster needed. Complex ops. Slow metadata propagation (~30s).
New (KRaft): Built-in Raft consensus. Single process. Sub-second metadata propagation. Simpler deployment.
Producers & Consumers — Java Code

Hands-on Java examples covering everything you need for interviews and production.

Producer — Complete Java Example

public class OrderProducer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("key.serializer",
            "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer",
            "org.apache.kafka.common.serialization.StringSerializer");
        // Durability settings
        props.put("acks", "all");
        props.put("retries", 3);
        props.put("enable.idempotence", "true");

        try (KafkaProducer<String, String> producer =
                new KafkaProducer<>(props)) {

            for (int i = 0; i < 10; i++) {
                ProducerRecord<String, String> record =
                    new ProducerRecord<>(
                        "orders",            // topic
                        "order-" + i,       // key
                        "{\"id\":" + i + "}" // value
                    );
                producer.send(record);
            }
        }
    }
}
// Custom serializer for domain objects
public class JsonSerializer<T> implements Serializer<T> {
    private final ObjectMapper mapper = new ObjectMapper();

    @Override
    public byte[] serialize(String topic, T data) {
        try {
            return mapper.writeValueAsBytes(data);
        } catch (Exception e) {
            throw new SerializationException(e);
        }
    }
}

// Using it with an Order object
props.put("value.serializer", JsonSerializer.class);
ProducerRecord<String, Order> record =
    new ProducerRecord<>("orders", order.getId(), order);
// Async send with callback — know this for interviews!
producer.send(record, (metadata, exception) -> {
    if (exception != null) {
        log.error("Failed to send: {}", exception.getMessage());
        // handle retry or DLQ
    } else {
        log.info("Sent to partition={} offset={}",
            metadata.partition(),
            metadata.offset());
    }
});

// For sync (blocking) confirmation:
RecordMetadata meta = producer.send(record).get();
// .get() blocks until broker acks

Consumer — Complete Java Example

public class OrderConsumer {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", "order-service-group");
        props.put("key.deserializer", "...StringDeserializer");
        props.put("value.deserializer", "...StringDeserializer");
        props.put("auto.offset.reset", "earliest"); // start from beginning
        props.put("enable.auto.commit", "false"); // manual commit!

        try (KafkaConsumer<String, String> consumer =
                new KafkaConsumer<>(props)) {

            consumer.subscribe(List.of("orders"));

            while (true) {
                ConsumerRecords<String, String> records =
                    consumer.poll(Duration.ofMillis(100));

                for (ConsumerRecord<String, String> record : records) {
                    System.out.printf(
                        "P:%d Offset:%d Key:%s Value:%s%n",
                        record.partition(),
                        record.offset(),
                        record.key(),
                        record.value()
                    );
                    processOrder(record.value());
                }
                // Commit AFTER processing (at-least-once semantics)
                consumer.commitSync();
            }
        }
    }
}
Delivery semantics — interview must-know:
At-most-once: commit before processing (can lose messages)
At-least-once: commit after processing (can duplicate — most common)
Exactly-once: requires idempotent consumer + transactions (hardest)

🔴 Live Producer/Consumer Simulator

Simulate Kafka message flow — produce messages and watch consumers process them across partitions.

Group lag: 0
00:00:00KAFKACluster ready. Topic: orders (3 partitions)
System Design Patterns with Kafka

E-commerce Order Processing & Payment/Banking — with sharding diagrams.

E-commerce Order Processing — Architecture

Orders flow through multiple services: inventory check, payment, fulfillment, notification. Kafka decouples all of them with event-driven choreography.

E-commerce order processing with Kafka API Gateway Kafka — Topic: order-events P0:userId P1:userId P2:userId Inventory Svc grp: inv-group Payment Svc grp: pay-group Fulfillment Svc grp: ful-group Notification grp: notif-group Topic: payment-events / Topic: fulfillment-events Analytics / Data Warehouse
Key design decision: partition key = userId
All events for the same user go to the same partition. This guarantees ordering — you'll never process "payment succeeded" before "order placed" for the same user.

Partitioning Strategy for E-commerce

Partitioning sharding strategy for e-commerce How 6 users shard across 3 partitions (key = userId) user-1 user-2 user-3 user-4 user-5 user-6 hash(userId) % 3 Partition 0: user-1, user-4 Partition 1: user-2, user-5 Partition 2: user-3, user-6

Handling Hot Partitions (Skew)

If one user generates 80% of traffic (think a VIP or a bot), one partition becomes a hot partition — a bottleneck. Solutions:

Salting the key: Append a random suffix to the key. userId + "_" + random(0,4). Distributes load but breaks strict per-user ordering.
Dedicated partition: Use a custom partitioner that routes high-volume users to multiple dedicated partitions, others to a shared pool.

Payment & Banking — Critical Requirements

No Duplicate Payments
Exactly-once required
Ordered per Account
Balance can't go negative
Audit Trail
Every event forever
Payment banking Kafka architecture Mobile App Web App Payment API idempotency key check Topic: payment-initiated (key = accountId, acks=all) Fraud Detect real-time ML Balance Check ordered/account Payment Proc. exactly-once Audit Log compacted Topic: payment-completed / payment-failed

Partitioning Strategy for Banking

In banking, ordering per account is critical. A debit must be processed before the next credit check, or the balance could go negative.

Partition key = accountId
All transactions for account ACC-001 go to the same partition → processed in strict order → balance is always consistent.
Banking account partitioning sharding Accounts shard by accountId — guarantees per-account ordering ACC-001 $50k/day ACC-002 $2k/day ACC-003 $100k/day ACC-004 $8k/day ACC-005 $30k/day ACC-006 $5k/day ACC-007 $200k/day P0: ACC-001, 004, 007 P1: ACC-002, 005 P2: ACC-003, 006 ⚠ ACC-007 hot!
Hot partition problem: ACC-007 does $200k/day — 10x more volume. Solution: dedicated partitions for high-volume accounts, detected at onboarding time via a custom partitioner.

Exactly-Once for Payments

// Producer with exactly-once semantics (transactions)
props.put("transactional.id", "payment-service-1");
props.put("enable.idempotence", "true");
props.put("acks", "all");

producer.initTransactions();
try {
    producer.beginTransaction();
    producer.send(debitRecord);   // debit account A
    producer.send(creditRecord);  // credit account B
    producer.commitTransaction(); // atomic!
} catch (ProducerFencedException e) {
    producer.abortTransaction(); // roll back both
}

🍃 Spring Boot Microservice — E2E Setup

Full Spring Boot integration. Covers config, producer, consumer, and Docker compose.

# application.yml — Kafka Spring Boot config
spring:
  kafka:
    bootstrap-servers: localhost:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
      acks: all
      retries: 3
      properties:
        enable.idempotence: true
        max.in.flight.requests.per.connection: 5
    consumer:
      group-id: order-service-group
      auto-offset-reset: earliest
      enable-auto-commit: false  # manual commits!
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
      properties:
        spring.json.trusted.packages: com.example.model
    listener:
      ack-mode: MANUAL_IMMEDIATE  # commit after processing

# Topic config (auto-create in dev)
app:
  kafka:
    topic:
      orders: orders
      payments: payment-events
      partitions: 3
      replication-factor: 1  # 3 in prod
@Service
@RequiredArgsConstructor
public class OrderProducerService {

    private final KafkaTemplate<String, OrderEvent> kafkaTemplate;

    @Value("${app.kafka.topic.orders}")
    private String ordersTopic;

    public CompletableFuture<SendResult<String, OrderEvent>>
    publishOrderCreated(Order order) {

        OrderEvent event = OrderEvent.builder()
            .orderId(order.getId())
            .userId(order.getUserId())
            .amount(order.getAmount())
            .status("CREATED")
            .timestamp(Instant.now())
            .build();

        // Key = userId ensures ordering per user
        return kafkaTemplate
            .send(ordersTopic, order.getUserId(), event)
            .whenComplete((result, ex) -> {
                if (ex != null) {
                    log.error("Failed to publish order {}: {}",
                        order.getId(), ex.getMessage());
                } else {
                    log.info("Order {} sent to P:{} O:{}",
                        order.getId(),
                        result.getRecordMetadata().partition(),
                        result.getRecordMetadata().offset());
                }
            });
    }
}
@Component
@Slf4j
public class OrderEventConsumer {

    private final PaymentService paymentService;
    private final MeterRegistry meterRegistry;

    @KafkaListener(
        topics = "${app.kafka.topic.orders}",
        groupId = "${spring.kafka.consumer.group-id}",
        concurrency = "3"  // 3 threads = 3 partitions
    )
    public void handleOrderEvent(
        @Payload OrderEvent event,
        @Header(KafkaHeaders.RECEIVED_PARTITION) int partition,
        @Header(KafkaHeaders.OFFSET) long offset,
        Acknowledgment ack
    ) {
        log.info("Processing order {} from P:{} O:{}",
            event.getOrderId(), partition, offset);
        try {
            paymentService.initiatePayment(event);
            ack.acknowledge(); // commit only on success
            meterRegistry.counter("kafka.orders.processed").increment();
        } catch (Exception e) {
            log.error("Failed processing order {}",
                event.getOrderId(), e);
            // Don't ack — message will be retried
            // Or route to DLQ after N retries
        }
    }
}
# docker-compose.yml — Local Kafka dev setup
version: '3.8'
services:
  kafka:
    image: confluentinc/cp-kafka:7.5.0
    ports:
      - "9092:9092"
    environment:
      # KRaft mode (no ZooKeeper!)
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
      CLUSTER_ID: MkU3OEVBNTcwNTJENDM2Qk

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    ports:
      - "8080:8080"
    environment:
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
    depends_on:
      - kafka
Tip: Run with docker compose up -d then open http://localhost:8080 for Kafka UI. You can visually inspect topics, partitions, and messages.
Interview Q&A

The questions that actually get asked. Click each to reveal the model answer.

25questions covered
5difficulty levels
Kafka is a distributed event streaming platform — a durable, ordered, append-only log of events. It's used for: (1) decoupling microservices via async event-driven communication, (2) high-throughput data pipelines processing millions of messages/sec, (3) event sourcing where every state change is an event, (4) real-time analytics. Unlike a traditional message queue (RabbitMQ), Kafka retains messages even after consumption — multiple consumers can read the same events independently.
A Topic is an ordered, immutable log of events. Key differences from a queue: (1) Retention — Topics retain messages by time/size, queues delete after consumption. (2) Multiple consumers — many consumer groups read the same topic independently; a queue delivers each message to only one consumer. (3) Ordering — Topics are ordered per-partition; queues generally don't guarantee order. (4) Replay — Topics allow consuming from any past offset; queues don't.
Partitions are the unit of parallelism in Kafka. A topic is split into N partitions, each being an ordered, immutable sequence of messages stored on one broker. They exist for: (1) Scalability — each partition can be on a different broker, distributing load horizontally. (2) Parallelism — each partition can be consumed by exactly one consumer per group simultaneously. (3) Throughput — single-partition topics are limited to one broker's I/O capacity. The partition count determines max consumer parallelism.
An Offset is a monotonically increasing integer that uniquely identifies a message within a partition (e.g. offset 0, 1, 2...). Key points: (1) Offsets are per-partition, not global — offset 5 in partition 0 is different from offset 5 in partition 1. (2) Consumers track their position using offsets. (3) Offsets are stored in the __consumer_offsets internal Kafka topic. (4) auto.offset.reset=earliest starts from offset 0; latest starts from new messages only.
A Consumer Group is a logical grouping of consumers that collaboratively consume a topic. Rules: (1) Each partition is assigned to exactly one consumer per group. (2) Multiple groups can consume the same topic independently — each group maintains its own offset. (3) If consumers > partitions, extra consumers are idle. (4) If consumers < partitions, some consumers handle multiple partitions. This is the mechanism for both load balancing AND broadcast (multiple groups = each gets all messages).
A Broker is a single Kafka server instance. It stores partitions, handles read/write requests, replicates data to other brokers, and manages consumer group membership. A cluster is multiple brokers. One broker is the Controller (in KRaft mode) responsible for partition leader elections. In production, use a minimum of 3 brokers for fault tolerance. Brokers are identified by a unique broker.id.
Key differences: (1) Retention — Kafka retains messages persistently (days/forever); RabbitMQ deletes after consumption. (2) Throughput — Kafka handles millions/sec; RabbitMQ handles thousands/sec. (3) Consumers — Kafka: pull-based, many groups read same topic; RabbitMQ: push-based, competing consumers. (4) Ordering — Kafka: guaranteed per partition; RabbitMQ: best-effort. (5) Use case — Kafka for event streaming, pipelines, high throughput; RabbitMQ for task queues, routing, RPC-style patterns.
acks=0: Producer fires and forgets — no confirmation. Lowest latency, highest throughput, but messages can be lost. Use for: non-critical metrics, click tracking.
acks=1: Leader broker confirms receipt, but doesn't wait for followers. If leader crashes before replication, message is lost. Default setting — balanced.
acks=all (or -1): Leader waits for all In-Sync Replicas to acknowledge. Highest durability — no data loss as long as at least one ISR exists. Use for: orders, payments, critical events. Combine with min.insync.replicas=2 to prevent accepting writes with only 1 replica.
ISR is the set of replicas that are fully caught up with the leader (within replica.lag.time.max.ms). It matters because: (1) acks=all only waits for ISR members — if a slow follower falls out of ISR, it doesn't block producers. (2) min.insync.replicas defines the minimum ISR size — if ISR falls below this, Kafka refuses writes (NOT_ENOUGH_REPLICAS error). (3) During leader failure, only ISR members can become the new leader (unless unclean.leader.election.enable=true, which risks data loss). A replica is removed from ISR if it hasn't fetched from the leader within the lag timeout.
At-most-once: Commit offset before processing. If processing fails, message is lost. Fastest, use for non-critical data.
At-least-once: Commit offset after successful processing. If processing fails, message is reprocessed → possible duplicates. Most common. Consumer must be idempotent.
Exactly-once (EOS): Uses producer transactions (enable.idempotence=true + transactional.id) + consumer with isolation.level=read_committed. Guarantees no duplicates AND no data loss. Most complex, slightly lower throughput. Required for financial transactions.
Rebalancing is the process of reassigning partitions to consumers in a group. Triggered by: (1) A consumer joins the group. (2) A consumer leaves or crashes (session timeout). (3) A consumer fails to call poll() within max.poll.interval.ms. (4) Partitions are added to the topic. During rebalancing, all consumption stops (stop-the-world). Modern Kafka uses Cooperative Incremental Rebalancing (Kafka 2.4+) to reduce this impact — only partitions that need to move are reassigned. Use session.timeout.ms and heartbeat.interval.ms to tune liveness detection.
Log compaction keeps the latest value per key, removing older records for the same key. It runs in the background as a log cleaner thread. A special tombstone record (null value) deletes a key entirely. Use cases: database change data capture (CDC), user profile state, configuration topics. Enable with cleanup.policy=compact. Note: compacted topics still preserve ordering within surviving offsets — they're not sorted by key. A message with offset 10 still comes after offset 5, even if they have the same key.
A DLQ (or Dead Letter Topic in Kafka) is a separate topic where failed messages are routed after N retry attempts. Pattern: (1) Consumer tries to process a message. (2) On failure, retry N times with backoff. (3) After N failures, send message to topic-name.DLT. (4) A separate consumer monitors the DLQ for alerting/manual intervention. Spring Kafka's @RetryableTopic and DeadLetterPublishingRecoverer implement this automatically. Always include original partition, offset, and error details as headers in the DLQ message for debugging.
A regular Kafka Consumer reads messages and processes them individually. Kafka Streams is a stateful stream processing library built on top of Kafka. Key differences: (1) State — Streams maintains local state stores (backed by Kafka topics) for aggregations and joins; regular consumers must manage state externally. (2) Operations — Streams provides filter, map, groupBy, aggregate, join, windowing DSL. (3) Topology — Streams defines a processing graph; consumers just loop. Use Streams for: real-time aggregations, session windows, table-stream joins. Use regular consumer for: simple per-message processing, when you don't need state.
Kafka guarantees ordering within a partition only — not across partitions. To ensure ordering: (1) Use a consistent partition key (all messages for the same entity go to the same partition). (2) Set max.in.flight.requests.per.connection=1 for strict ordering without idempotence, or use enable.idempotence=true with up to 5 in-flight requests. (3) The consumer should have only one active consumer per partition (enforced by consumer groups). Across partitions, there is no ordering guarantee. If you need total ordering, use a single-partition topic (but this kills scalability).
Leader failure sequence: (1) Follower stops receiving heartbeats from leader (detected within replica.lag.time.max.ms). (2) Controller detects the leader is down (via ZooKeeper session expiry or KRaft mechanism). (3) Controller selects new leader from ISR (the most caught-up replica). (4) Controller updates partition metadata. (5) Producers and consumers are notified of new leader via metadata refresh. The failover typically completes in seconds. If ISR is empty and unclean.leader.election.enable=false (default), the partition becomes unavailable until an ISR member returns — this is the safe choice for critical data.
ZooKeeper was used for cluster metadata management: broker registration, partition leadership, topic configuration. Problems: (1) Separate cluster to operate and monitor. (2) Slow metadata propagation — up to 30 seconds on large clusters. (3) ZooKeeper became a bottleneck for scaling. KRaft (Kafka Raft) replaces ZooKeeper with a built-in Raft consensus algorithm. Benefits: (1) Single process per broker — simpler deployment. (2) Sub-second metadata propagation. (3) Supports millions of partitions (vs ~200k with ZooKeeper). (4) Faster controller failover. KRaft is GA since Kafka 3.3, default in Kafka 4.0.
Consumer lag = producer offset - consumer committed offset. Causes: consumer too slow, partition count too low, message processing bottleneck. Solutions: (1) Increase parallelism — add more partitions + more consumer instances (up to partition count). (2) Optimize processing — async downstream calls, batching, caching. (3) Reduce message size — compression, smaller payloads. (4) Increase consumer throughput — tune max.poll.records, fetch.min.bytes. (5) Monitor with kafka-consumer-groups.sh --describe or Prometheus + Grafana. Alert when lag exceeds threshold (e.g., 1 million messages or 5 minutes behind).
Kafka stores messages in log segments — binary files on disk per partition. Each partition directory contains: (1) .log files — actual message data, append-only. (2) .index files — sparse index mapping offset → file position for O(log n) seeks. (3) .timeindex files — timestamp → offset mapping for time-based seeks. Kafka uses sequential I/O (append-only writes, sequential reads) — this is why it's fast even on spinning disks. Messages are also held in the OS page cache, so reads of recent messages hit memory. Log segments are rolled when they reach log.segment.bytes (default 1GB) or log.roll.ms.
Kafka's throughput comes from several design decisions: (1) Sequential disk I/O — append-only writes are 100x faster than random writes. (2) OS page cache — recent data is served from RAM, not disk. (3) Zero-copysendfile() syscall transfers data from page cache to network socket without copying to user space. (4) Batching — producers accumulate messages into batches (linger.ms, batch.size) before sending, amortizing network overhead. (5) Compression — batch-level compression (snappy, lz4, zstd) reduces network and storage I/O. (6) Partition parallelism — multiple partitions across multiple disks and brokers.
Topics: order-created, inventory-reserved, payment-processed, order-fulfilled, notifications.
Partitioning: Key = userId for ordering guarantees. 12 partitions for ~12 consumer instances.
Services: Each microservice subscribes to relevant topics with its own consumer group — independent scaling. Inventory service subscribes to order-created; payment subscribes to inventory-reserved (after stock confirmed); fulfillment to payment-processed.
Fault tolerance: acks=all, idempotent consumers, DLQ for failures, saga pattern for distributed transactions.
Monitoring: Consumer lag dashboard, per-topic throughput, alert on lag > 10k msgs.
Horizontal scaling: Add brokers (start with 9-12 for this load), spread partitions evenly. Aim for 10+ partitions, each handling ~1M msg/s.
Producer tuning: linger.ms=5-20ms, batch.size=256KB-1MB, compression=lz4 (best throughput/CPU ratio), acks=1 if you can tolerate minor data loss risk.
Consumer tuning: max.poll.records=500, async processing, consumer parallelism = partition count.
Hardware: SSDs for log storage, separate network interfaces for replication vs client traffic, 10Gb+ network, dedicated disks per partition directory.
Multi-datacenter: MirrorMaker 2 for cross-DC replication if geo-redundancy needed.
Duplicates arise in at-least-once delivery (consumer reprocesses on crash). Solutions: (1) Idempotent consumer — deduplicate based on message ID stored in Redis/DB. Check "have I processed event-id X?" before processing. (2) Idempotent operations — design your business logic to be naturally idempotent (UPDATE SET x=v WHERE x!=v is idempotent; INSERT is not). (3) Exactly-once producerenable.idempotence=true prevents producer-side duplicates caused by retries. (4) Transactional consumers — consume + produce + commit offset atomically. The best strategy depends on the cost of duplicates vs implementation complexity.
Saga manages distributed transactions without two-phase commit. With Kafka, use choreography-based saga: (1) Order service publishes order-created. (2) Inventory service consumes, reserves stock, publishes inventory-reserved or inventory-failed. (3) Payment service consumes inventory-reserved, processes payment, publishes payment-processed or payment-failed. (4) On any failure event, compensating transactions are published upstream. Challenges: (1) No global atomicity — compensating actions can also fail. (2) Hard to debug — need distributed tracing (correlation-id header). (3) Circular dependencies. For very complex flows, consider orchestration-based saga with Conductor or Temporal.
Broker metrics: BytesInPerSec/BytesOutPerSec (throughput), RequestHandlerAvgIdlePercent (>70% = healthy), UnderReplicatedPartitions (should be 0), ActiveControllerCount (should be 1), OfflinePartitionsCount (should be 0).
Consumer metrics: records-lag-max (consumer lag per partition), fetch-rate, commit-rate.
Producer metrics: record-send-rate, record-error-rate, request-latency-avg.
JVM: GC pause time, heap usage.
Disk: log directory usage, disk I/O utilization.
Tools: JMX + Prometheus + Grafana, Confluent Control Center, Kafka UI. Alert on UnderReplicatedPartitions > 0, consumer lag spike, broker down.