Sponsorship is available! If you'd like to sponsor CodeOpinion.com and have your product or service advertised exclusively (no AdSense) on every post, contact me.

Do you want to use Kafka? Or do you need a Queue?

Do you want to use Kafka? Or do you need a message broker and queues? While they can seem similar, they have different purposes. I’m going to explain the differences, so you don’t try to brute force patterns and concepts in Kafka that are better used with a message broker.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Partitioned Log

Kafka is a log. Specifically a partitioned log. I’ll discuss the partition part later in this post and how that affects ordered processing and concurrency.

Kafka Log

When a producer publishes new messages, generally events, to a log (a topic with Kafka), it appends them.

Kafka Log

Events aren’t removed from a topic unless defined by the retention period. You could keep all events forever or purge them after a period of time. This is an important aspect to note in comparison to a queue.

With an event-driven architecture, you can have one service publish events and have many different services consuming those events. It’s about decoupling. The publishing service doesn’t know who is consuming, or if anyone is consuming, the events it’s publishing.

Consumers

In this example, we have a topic with three events. Each consumer works independently, processing messages from the topic.

Consumers

Because events are not removed from the topic, a new consumer could start consuming the first event on the topic. Kafka maintains an offset per topic, per consumer group, and partition. I’ll get to consumer groups and partitions shortly. This allows consumers to process new events that are appended to the topic. However, this also allows existing consumers to re-process existing messages by changing the offset.

Just because a consumer processes an event from a topic does not mean that they cannot process it again or that another consumer can’t consume it. The event is not removed from the topic when it’s consumed.

Commands & Events

A lot of the trouble I see with using Kafka revolves around applying various patterns or semantics typical with queues or a message broker and trying to force it with Kafka. An example of this is Commands.

There are two kinds of messages. Commands and Events. Some will say Queries are also messages, but I disagree in the context of asynchronous messaging.

Commands are about invoking behavior. There can be many producers of a command. There is a required single consumer of a command. The consumer will be within the logical boundary that owns the definition/schema of the command.

Sending Commands

Events are about notifying other parts of your system that something has occurred. There is only a single publisher of an event. The logical boundary that publishes an event owns the schema/definition. There may be many consumers of an event or none.

Publishing Events

Commands and events have different semantics. They have very different purposes, and how that also pertains to coupling.

Commands vs Events

By this definition, how can you publish a command to a Kafka topic and guarantee that only a single consumer will process it? You can’t.

Queue

This is where a queue and a message broker differ.

When you send a command to a queue, there’s going to be a single consumer that will process that message.

Queue Consumer

When the consumer finishes processing the message, it will acknowledge back to the broker.

Queue Consumer Ack

At this point, the broker will remove the message from the queue.

Queue Consumer Ack Remove Message

The message is gone. The consumer cannot consume it again, nor can any other consumer.

Consumer Groups & Partitions

Earlier I mentioned consumer groups and partitions. A consumer group is a way to have multiple consumers consume from the same topic. This is a way to concurrently scale and process more messages from a topic called the competing consumer pattern.

A topic is divided into partitions. Events are appended to a partition within a topic. There can only be one consumer within a consumer group that processes messages from a partition.

Meaning you will process messages from a partition sequentially. This allows for ordered processing.

As an example of the competing consumers pattern, let’s say we have two partitions in a topic. Each partition right now is a single event in each. We have two consumers in a single consumer group. We’ve defined that the top consumer will consume from the top partition, and the bottom consumer will consume from the bottom partition.

Kafka Partition

This means that each consumer within our consumer group can process each message concurrently.

Kafka Partition

If we publish another message to the top partition, this means the top consumer again is the one responsible for consuming it. If it was busy processing another message, the bottom consumer, even if it’s available, will not consume it. Only the top consumer is associated with the top partition.

This allows you to consume messages in order, so as long as you associate them to the same partition.

In contrast, the competing consumers’ pattern with a queue works slightly differently as we don’t have partitions.

If we have two messages in a topic, and we have two consumers within a single consumer group.

Competing Consumers

Messages are consumed by any free/available consumer. Because there are two free consumers, both messages will be consumed concurrently.

Competing Consumers

Even though messages are distributed FIFO (First-in-First-Out), that doesn’t mean we will process them in order.

Why does this matter? With Kafka partitions, you can process messages in order. Because there is only a single consumer within a consumer group associated with a partition, you’ll process them one by one. This isn’t possible with queues. The downside is that if you publish messages to a partition faster than you can consume them, you can end up in a backlog disaster.

Kafka or Message Broker Queues & Topics?

Hopefully, this post (and video) illustrated some of the differences. The primary issue I’ve come across is people using Kafka but trying to apply patterns and concepts (commands, competing consumers, dead letter queues) that are typical with a message broker using queues and topics, but it just doesn’t fit.

Typically when you’re creating an asyncronous workflow, you’re consuming events and sending commands. While you technically can create a topic for commands, you can’t guarantee there won’t be more than a single consumer. Is this a big deal? To me, semantics matter. If you’re already using Kafka and don’t want to introduce another piece of infrastructure like a queue/message broker, then I understand the reasoning for doing so.

Understanding the differences in how the competing consumers pattern works. If you’re not configured correctly and are publishing to a single partition, then you can’t increase throughput by adding another consumer to a consumer group.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Just store UTC? Not so fast! Handling Time zones is complicated.

Should you store dates & times in your database as UTC? It’s pretty standard advice if you’re working in a system that needs to record dates and times from many different time zones. But this advice doesn’t hold true when dealing with dates and times in the future; here are some things you need to consider.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Just store UTC? Not really.

You’ll hear/read pretty standard advice to store all dates/times as UTC when storing dates/times in a database. As users enter data into your system, you might convert their specific local date time to UTC and then save that in your database.

For example, say you have a web application where the user must enter a date/time value. They would be specifying it in their local time. This date/time in ISO format would be sent to your Server/API as 2022-08-02T18:00:00-400. This is their local date time with their current time zone offset. Your app would then convert this to UTC, which would be 2022-08-02T22:00:00Z

Then when we need to return data to the client, we fetch it out of the database, which is in UTC, and send that UTC date/time to the client, where it can then convert it to its local date/time.

Sounds straightforward, right?

There are two reasons why I think this is the standard advice given on storing date/times. The first is because you’re standardizing your date/times in your database, which means you can query your data and compare it against the UTC. This allows you to sort, filter, etc., all on a standardized date/time. You won’t need any conversion at runtime to do any sorting and filtering.

The second reason I think this is standard advice is that people are only thinking about date/times that are in the past. And that’s where the most significant problem lies, date/times in the future.

Time zones & Daylight Saving Time

A date/time in the future, converted to UTC as of right now, might change. How? Because Time Zone rules can change. Daylight Saving Time rules can also change.

When you do a conversion from a local date/time as of right now, at this very moment, you’re applying the current timezone and daylight saving time rules. But that’s not to say these rules will always be like this. They change more than you think!

This means that when you convert a local date time that’s in the future to a UTC date/time and then save that to your database, you’re basing on the rules at the time of persistence, not what it would be in the future when that date/time is realized.

So what should you be saving in your database? Let’s walk through an example.

Here is an object with a future date/time with a time zone offset and a location in Toronto, Canada.

As mentioned, if the timezone rules or daylight saving time rules change, this could be incorrect as the time zone offset could be wrong. How would we correct or convert this date time to the correct value? How would we know what the correct value should be? You wouldn’t be able to, as you have no information about what time zone was used. All you have is the time zone offset.

If we were storing UTC instead, we still have the exact same problem. We cannot convert it to the correct time if there are any rule changes.

Time Zone Database

One solution is using the Time Zone Database. If you’re using a library for handling dates and times (eg, JodaTime, NodaTime), it is likely already using the Time Zone Database (tzdb) under the hood. The Time Zone Database contains all time zone boundaries, UTC offsets, and daylight-saving rules.

It also contains a time zone identifier (IANA) that you can use to reference any time you specify a local date. In your database, you can then persist this time zone identifier with your local date/time.

You’ll notice the dateTime property doesn’t have the time zone offset anymore. It’s just the literal date/time the user specified. This is because now we have the IANA time zone name (America/Toronto) that relates to that literal date/time.

As mentioned earlier, the reason to standardize to UTC is so you can query your database effectively for sorting and filtering. We can still do this at the time of persistence in addition to the literal local date/time and time zone name.

So as mentioned, time zone and daylight-saving time rules can change. So what happens if they do? Well, what we can also record is the version of the Time Zone Database. This way, we know what version and rules were used when we converted to UTC.

When rules change, the time zone database will get updated. We can query our database looking for future date/times where the tzdb is an older version and then redo the conversion of the dateTimeLocal using the IANA name to convert back to the correct UTC and update that value in our database.

Just store UTC?

While the standard advice of “just store UTC” can work, it will only work if you’re storing date/time that is in the past. If you need to store date/time in the future, then you want to record the time zone name and the version of the tzdb. And for querying purposes, you’ll want to convert to UTC. And if there ever is a change to any rules, you can update the UTC date/times.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

STOP Over-Engineering Software!

Can we, as application developers, stop over-engineering software? I hate to use the term engineering even to describe it! I’m guilty of it too. I was writing “clever” code that was overly complex, hard to understand, and hard to change. Here are some of the pitfalls I see and what I think about to guide me down a more straightforward path.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

“What if” Game

One of the traps I think developers can get into is playing the “what if” game. When we, as developers, gain more insights into the domain we’re working in and start making assumptions. The issue is then we don’t validate those assumptions, which can be widely incorrect. Often this occurs when we discover an edge case we think we need to solve within code. In reality, it’s so infrequent that the business doesn’t need it to be automated or handled by our code. Instead, it could produce some notification or indication to someone in the business, and they can manually resolve it. Not everything needs to be handled in code, especially when the solution is complex and the case of it occurring is very low. This is just a low-value, high-cost edge case. Validate assumptions, and talk with the business! Don’t write complex code when you don’t need to. Stop over-engineering.

Caring too much about tech

Developers love everything code, especially new and shiny libraries, frameworks, and tools. We can lose sight of why we’re writing code in the first place. It’s to solve business problems and create value. Yes, we want to stay up-to-date with the latest technology so we can improve our systems. However, we have to have an equal amount of focus on the domain we’re working in.

Care about the domain as much as the tech.

We’re writing software to solve business problems, and you need to understand your domain. If all you care about is the latest tech, you’ll be adding libraries, frameworks, and tools that are not required, and only making the solution more complex is just over-engineering. Use the tech that best solves the problems that you have. Stop chasing the new and shiny if it doesn’t provide any value to your system.

Don’t roll your own!

It’s unlikely that you’ll genuinely need to write your own library or framework. But how often have you worked in a system with its own web framework, ORM, etc.? Likely you have. Was it documented? Absolutely not. It was some homegrown framework that someone who is no longer at the company wrote years ago, and now you’re stuck with it. I get it; we love tech but all your adding is complexity when you go down the road of writing some fundamental framework that your system depends on.

A good example is writing your messaging library on top of RabbitMQ/AWS SQS/Azure ServiceBus. People often do this without realizing all of the different patterns and concepts they will need to implement, and before long, they have their own homegrown messaging library. Instead, you could use a battle and production-tested messaging library that has all the features & patterns that are commonly required with messaging. Buy, don’t build complexity by over-engineering.

Not Enough Up-Front Design

Not understanding the domain enough will lead to making an overly complex system, likely because of over-engineering. How? This relates to the “what-if” game. Developers will often add layers of abstraction and useless indirection because of “what-ifs.”

“What if we need to do X” or “What if X changes, let’s add Y.”

YAGNI (You aren’t gonna need it). Adding useless layers of abstraction indirection that you’ll never need.

I’m not talking about BIG upfront design in a waterfall-style specification. But exploring the domain, especially in a green-field project by leveraging something lightweight like event-storming. Understand the workflows and the different perspectives of different people within the business.

DRY per Boundary (be explicit)

Don’t repeat yourself has caused a ton of confusion to developers. A great example of DRY gone wrong is when things get overly generic trying to accommodate different use cases, ultimately leading to a ton of complexity.

Entity Services are another example of DRY gone wrong. Entity services are what I call services that operate on a single entity or a set of entities. And that entity only ever exists within that boundary. An example of this is a ProductService in a warehouse system. A ProductService contains all the product information, such as the name, price, cost, quantity on hand, etc. and all the various ways to change that data.

The problem with this is that you do want to repeat yourself. The concept of a product in a warehouse exists in multiple different boundaries. Sales would care about the price; Purchasing would care about the cost; Shipping & Receiving handle the quantity on hand. There is no single Product Entity, but that rather each boundary would have its own concept of a Product exposing the functionality and data that is relevant to it.

Over Abstraction

It seems pretty typical to want to abstract any 3rd party dependency, library or tool. On the surface, this makes sense, and you create your own abstraction so that your application code depends on it rather than the 3rd party. That way, if you ever need to change that dependency, you don’t have to change all your application code, you change the abstraction you created. Sounds reasonable? Not entirely, it can be over-engineering.

Over-Engineering

It comes down to coupling and managing it. The point of the abstraction is to simplify the underlying concepts best suited for your use case. Creating an abstraction will limit your ability to leverage all the dependency has to offer. Manage coupling! How? Vertical slices and focus on features. Grouping functionality by features allows you to narrow your focus and make localized decisions about what you depend on per feature (or feature set).

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design