Skip to main content

We're hiring!

Check out our open roles.

Danubio
Engineering

Do you actually need Kafka?

More than 80% of the Fortune 100 run Kafka, so teams reach for it by default. We run it at 20,000 events a second, and we also turn it down. How we decide.

Sava Markovic

Founder, Danubio

12 min readMay 22, 2026
The Vitals performance dashboard showing the Total Vital Score and KPIs.
Vitals, the real-time performance dashboard built on Kafka.

The default reach

Kafka is the first thing most teams reach for when they see events. More than 80% of the Fortune 100 run it, and plenty of teams add it before they have a reason to. We run it where it earns its cost, and we steer teams to something simpler when that is the better call.

Streaming infrastructure is worth its real operational cost when the stream is the system. For many teams reaching for it, the queue only supports the system, and a simpler tool fits better. The skill is telling the two apart.

We have built both kinds. This is where Kafka earns its keep, where it is overkill, and how we decide.

Key Takeaways

  • Kafka is the reflexive default: 80%+ of the Fortune 100 run it (Apache Kafka).
  • It earns its keep when the stream is the system. On Vitals, a change-data-capture pipeline into Kafka computes a live score at 20,000 events a second, under three seconds end to end, without touching the source CRM.
  • It is the wrong call when the queue only supports the system; a simpler queue like Postgres or SQS is less to run and easier to debug.
  • AI makes a Kafka setup trivial to generate, so the judgment about whether you need it matters more.

Why does everyone reach for Kafka?

Reputation. Kafka runs at more than 80% of the Fortune 100 (Apache Kafka), and that ubiquity makes it the safe-sounding default. Reach for Kafka and nobody questions the choice. Reach for something smaller and you have to defend it.

The other reasons are familiar: we might need scale someday, it is what the big companies use, and it looks good on the team. The reflex is understandable. The part that gets skipped is the operating cost: a cluster to run, brokers to keep healthy, consumer groups and offsets to reason about, and a new failure surface that has nothing to do with your product.

That operating cost is worth paying when Kafka is the right tool. The point is to be sure it is, before you take it on.

When does Kafka earn its keep?

When the stream is the system. The clearest example in our work is Vitals, a daily performance dashboard Inside Real Estate built for every brokerage on BoldTrail, around 400,000 users. The score had to be near-real-time and consistent across every level of the org, and we could not modify the CRM it sat on top of.

So we stood outside the CRM and listened to it. A Debezium change-data-capture layer reads BoldTrail's transaction log and streams every change into Kafka. Vitals consumes that stream and rebuilds everything it needs, without ever querying the CRM. Decoupling from a source you cannot touch is exactly what streaming is for.

Each of the eight KPIs runs as its own stream of derived events. A microservice joins the events that move a metric and emits a running value, so the number is ready before the dashboard opens. No nightly batch. If a connector fails, Vitals reconciles from BoldTrail and rebuilds the affected KPIs, so streaming carries the steady state and reconciliation covers the hiccups.

Change data capture into KafkaA Debezium change-data-capture connector reads the BoldTrail CRM transaction log and streams changes into Kafka. KPI-enrichment services consume the stream and emit derived events, and the Vitals score reads from there. There is no write path back to the CRM.BoldTrail CRMsystem of recordDebezium CDCread-onlyKafkaCDC event streamKPI enrichmentderived event streamsVitals scoreCompany to Agent
Change data capture streams every CRM change into Kafka. Vitals reads from there and never writes back to the source.

20K / sec

Events through the pipeline at peak

A CRM change reaches the score in under three seconds, consistent across Company, Office, Team, and Agent.

Every reason to pay for Kafka was present at once: a source we had to decouple from, high sustained throughput, many derived consumers, and replay when the streaming path hiccuped. The clearest signal it was the right call came afterward, when Inside Real Estate's own teams adopted the same pattern for later work.

This was a genuinely hard problem: real-time scoring on top of a CRM we couldn't modify. Danubio came up with an elegant solution, and it worked so well that our internal teams have used the same pattern on other initiatives since. Real credit to them.
Andrew Hartnett, EVP Product & Engineering, Inside Real Estate

The full Vitals build goes deeper on the pipeline.

When is Kafka the wrong call?

When the queue only supports the system. Most times a team reaches for Kafka, what they need is a reliable queue: a list of jobs to work through, a way to hand work between services, a buffer in front of a slow consumer. For that, Kafka is a lot of machinery to run.

A Postgres-backed queue, or a managed queue like SQS, does that job with far less to operate. There is no cluster to keep healthy and no new failure surface. A message is a row you can read, a failure is visible, and a stuck job can be fixed with plain SQL. For early-stage products, internal tools, and moderate workloads, that is usually the right answer, and most teams overestimate the scale they will actually reach.

We make that call regularly. When an engagement needs a reliable way to process background work at a moderate rate, we reach for a Postgres-backed queue. It handles the load, stays easy to inspect, and adds nothing extra to operate. The day a workload genuinely outgrows it is the day to revisit, and most workloads never do.

Kafka is excellent at what it is for. The expensive mistake is paying its operating cost for a problem a queue already solves.

What does the AI era change?

It lowers the cost of the wrong reflex. AI will generate a working Kafka setup in minutes, which makes reaching for it easier than it already was. What the generated config does not show is the operating cost you have just signed up for.

So the judgment matters more than ever. The question is the same as it always was, and now you have to ask it on purpose, because the tooling will happily skip it. That is the broader pattern in what AI changes about building software: the work gets faster, the judgment stays yours.

How we decide

We ask one question first: does the queue support the system, or is the stream the system? If the queue moves work between parts of a system that would be fine without it, we use the simplest reliable queue. If the stream is the product, the way it was for Vitals, we reach for Kafka and pay its cost on purpose.

Then we size that cost honestly, against the real, demonstrated need, rather than the scale we might reach someday. The simplest tool that meets the actual requirement wins, until the requirement genuinely changes.

Knowing when a heavy tool earns its place is most of what senior engineering is. It is the work we do as an ongoing engineering partner: the calls that are cheap to make right and expensive to undo.

Frequently asked questions

Do you actually need Kafka?

Only when the stream is the system. If you need high throughput, change data capture, many independent consumers, or replay at real scale, Kafka earns its cost. If you just need a reliable queue between parts of a system, a simpler tool usually does the job with far less to run.

When is Kafka overkill?

For early-stage products, internal tools, and moderate workloads. There a Postgres-backed queue or a managed queue like SQS is enough: no cluster to keep healthy, no new failure surface, and a message is just a row you can inspect. Most teams overestimate the scale they will hit.

When does Kafka actually earn its cost?

When the stream is the product. On Vitals, a change-data-capture pipeline into Kafka computed a live score at 20,000 events a second, under three seconds end to end, on top of a CRM we could not modify. Decoupling, high throughput, many derived consumers, and replay all made Kafka the right call.

Is Postgres a good message queue?

Yes, when the queue supports the system. A message is a row, failures are visible, and a stuck job can be fixed with plain SQL, with nothing extra to operate. It has limits at very high concurrency, which is the point where a dedicated streaming platform starts to pay off.

Sources

  • Apache Kafka (Fortune 100 adoption). Retrieved May 22, 2026. kafka.apache.org
  • Gunnar Morling, “You Don't Need Kafka, Just Use Postgres” Considered Harmful. Retrieved May 22, 2026. morling.dev
  • BoldTrail's Vitals Dashboard Digs Into Business Drivers, Inman, February 12, 2025. Retrieved May 22, 2026. inman.com
Sava Markovic avatar

Sava Markovic

Founder, Danubio

Sava founded Danubio in 2018 to be the kind of engineering partner he always wanted on the other end of a critical project: senior, direct, trusted with meaningful product work. He keeps the company focused on strong technical judgment, close client relationships, and software that needs to be done well.

More from this author
Like the way we work?

Tell us what you're building. We'll bring the dragons.