Kafka at 1M Messages/Second with Go: Our Exact Pipeline Setup

When your business depends on processing millions of events per second, building a reliable and performant Kafka consumer-producer pipeline…

Harishsingh

~4 min read · June 23, 2025 (Updated: June 23, 2025) · Free: No

When your business depends on processing millions of events per second, building a reliable and performant Kafka consumer-producer pipeline becomes critical. In this article, I'll walk you through the architecture, implementation, and lessons we learned scaling Kafka with Go to consistently handle 1 million messages per second — and how we got there without burning out our team or machines.

This isn't a theoretical guide. This is what we actually built.

Problem Statement

We needed to ingest and process data at >1M messages/sec:

Events coming from IoT devices and user interactions
High fan-out to multiple downstream services
Durable and observable architecture
Real-time processing with minimal latency

High-Level Architecture

              +---------------+        +-----------------+
              | Kafka Topic A | ---->  | Go Consumer App |
              +---------------+        +-----------------+
                                                 |
                                                 v
                                        +------------------+
                                        | Worker Pool       |
                                        | (Goroutines)      |
                                        +------------------+
                                                 |
                                                 v
                                        +------------------+
                                        | Output Layer      |
                                        | (Kafka, DB, etc.) |
                                        +------------------+

Key Goals

Throughput: Sustain 1M msg/sec with headroom.
Backpressure handling: No message drops under load.
Crash resilience: Auto-recovery with offset commits.
Observability: Track lag, latency, and throughput.

Our Kafka Stack

We used:

Apache Kafka 3.x
Confluent Go client (github.com/confluentinc/confluent-kafka-go)
6 partitions minimum per topic (more later)
Kafka Connect for data export
Go 1.21+

Code Setup: High Throughput Consumer

import "github.com/confluentinc/confluent-kafka-go/kafka"

func createConsumer(brokers, groupID, topic string) *kafka.Consumer {
    c, err := kafka.NewConsumer(&kafka.ConfigMap{
        "bootstrap.servers":  brokers,
        "group.id":           groupID,
        "auto.offset.reset":  "earliest",
        "enable.auto.commit": false,
        "go.events.channel.enable": true,
        "queued.max.messages.kbytes": 20480, // 20 MB
    })
    if err != nil {
        panic(err)
    }
    err = c.SubscribeTopics([]string{topic}, nil)
    if err != nil {
        panic(err)
    }
    return c
}

Worker Pool for Message Handling

func startWorkerPool(workerCount int, messages <-chan *kafka.Message, wg *sync.WaitGroup) {
    for i := 0; i < workerCount; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            for msg := range messages {
                process(msg) // handle message logic
            }
        }(i)
    }
}

Dispatcher Pattern

We separate consumption from processing:

func dispatchMessages(c *kafka.Consumer, out chan<- *kafka.Message) {
    for ev := range c.Events() {
        switch e := ev.(type) {
        case *kafka.Message:
            out <- e
        case kafka.Error:
            log.Printf("Kafka error: %v\n", e)
        }
    }
}

Batching for Output Efficiency

When writing to downstream services (like another Kafka topic or DB), we batch messages:

func batchWriter(input <-chan *kafka.Message) {
    batch := make([]*kafka.Message, 0, 1000)
    ticker := time.NewTicker(10 * time.Millisecond)
    for {
        select {
        case msg := <-input:
            batch = append(batch, msg)
            if len(batch) >= 1000 {
                flush(batch)
                batch = batch[:0]
            }
        case <-ticker.C:
            if len(batch) > 0 {
                flush(batch)
                batch = batch[:0]
            }
        }
    }
}

Performance Benchmarks

Hardware:

8 vCPUs
32 GB RAM
1Gbps NIC
Kafka and Go app on separate VMs

Settings:

12 partitions
6 consumers
go.maxprocs = runtime.NumCPU()

| Setup                  | Messages/sec | Avg Latency | CPU Usage |
| ---------------------- | ------------ | ----------- | --------- |
| Single Consumer        | \~150K       | \~12ms      | \~30%     |
| 6 Consumer Goroutines  | \~950K       | \~6ms       | \~85%     |
| Worker Pool + Batching | **1.2M**     | **<3ms**    | \~90%     |

Observability: Monitoring Lag and Throughput

We used:

Prometheus + Grafana
consumer_lag via Kafka exporter
runtime.NumGoroutine()
Custom histograms for per-stage latency

messagesProcessed := prometheus.NewCounterVec(...)
latencyHistogram := prometheus.NewHistogramVec(...)

Lessons Learned

1. Partitions Matter

Kafka partitions are the unit of parallelism. Start with 6–12, scale as needed. Each partition maps to one goroutine in our setup.

2. Don't Auto-Commit Blindly

Manual commit ensures we only ack messages we've fully processed.

consumer.CommitMessage(msg) // after success

3. Backpressure Control

Bounded channels and batching helped prevent goroutine explosion.

4. Memory Pressure

High-throughput means buffering; avoid large allocations per message. Reuse buffers if possible.

5. Avoid Global Locks

Lock contention in shared state (e.g., metrics) can throttle throughput.

Final Architecture Diagram

                    Kafka Topic (12 Partitions)
                                 |
     +---------------------------+---------------------------+
     |                           |                           |
+-----------+            +-----------+              +-----------+
| Consumer1 |            | Consumer2 |              | Consumer6 |
+-----------+            +-----------+              +-----------+
     |                         |                          |
     v                         v                          v
+------------+         +------------+            +-------------+
| WorkerPool |         | WorkerPool |            | WorkerPool  |
+------------+         +------------+            +-------------+
     |                         |                          |
     v                         v                          v
  +--------+              +---------+              +-------------+
  | Output |              | Metrics |              | DB / Kafka  |
  +--------+              +---------+              +-------------+

Summary

We didn't throw servers at the problem. We designed for scale with Go's strengths:

Goroutines made concurrency cheap.
Channel-based pipelines gave structure.
Backpressure and batching gave control.

If you're trying to hit 1M messages/sec, Go and Kafka are more than capable — if you design carefully.

#go #golang #kafka #apache-kafka #messaging-queue