Skip to content

Gotchas! in Event Driven Architecture

Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

Learn more about Software Architecture & Design.
Join thousands of developers getting weekly updates to increase your understanding of software architecture and design concepts.


Event Driven Architecture has a few gotchas. Things that you always need to be aware of or think about. Regardless of whether you’re new to Event Driven Architecture or a seasoned veteran, these just don’t go away. Let me cover 4 aspects of event driven architecture that you need to think about.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

At Least Once Delivery

You’ll likely use a broker that supports “at least once delivery”. This means that a message can be delivered to a consumer at least once. Meaning it can happen once or more. Because of this, consumers need to be able to handle getting a duplicate message, or a message they’ve already consumed.

There are different reasons why this can occur, and the first is because of the message broker.

When a message broker delivers a message to the consumer, the consumer needs to process the message and then acknowledge back to the broker that it consumed it successfully.

Gotchas! in Event Driven Architecture

If the consumer fails to acknowledge to the broker, say because an internal exception occurred when processing the message, then the message broker after a given period of time (invisibility timeout) will re-deliver that message to the consumer. This could also happen if the length of time it takes to consume a message is greater than the invisibility timeout. If the consumer does not acknowledge before the timeout, it will get re-delivered, regardless of why.

The second reason why this can happen is because of the publisher of the message. If the publisher is using the Outbox Pattern, this could cause them to send the same message more than once. However, the publisher might also send the same message in different cases such as sending a different message with the same values. In either case, if you’re going to be negatively impacted by processing a duplicate message, then you need to make your consumers idempotent.

Consistent Publishing

Events become first-class within Event Driven Architecture in your system, you need to be aware of when and where you’re publishing them. Often times events are published to indicate something has occurred such as a state change. This means that your code needs to be consistent that if a certain action has occurred, you’re publishing the appropriate event. If you make a state change and do not publish the event, this can have a negative impact on other parts of your system that are expecting it.

As an example, here’s a transaction script that is related to a food delivery app. When the food delivery driver arrives at the restaurant to pick up the food, the is this first action they perform, the “Arrive”.

One of the things we are doing along with making some state changes is publishing an Arrived event.

Now the second action the delivery driver will perform is called the “Pickup” after they have arrived at the restaurant. Once they “pick up” the food, this is the action that is performed.

Some of the logic we have however is that if they haven’t yet done the Arrive action first, we’ll just let them do the Pickup. The problem is that we’re making the state change for “Arrive”, yet not publishing the Arrived event.

The gotcha here is that Events are now first class. When certain state changes occur, you might need to publish an event for other service boundaries. Making this simple state change without publishing the Arrived event could have a negative impact on other service boundaries. You want to narrow down and encapsulate state change sand publishing events into something like an Aggregate.

Bypassing API

You cannot bypass your API. Your API is what is responsible for publishing events when certain state changes or behaviors are invoked. Event Driven Architecture makes Events first class and needs to be reliable and consistently published.

As an example, the dotted outlined square is a logical service boundary. When a client makes a request to the App Service to perform some action, it makes some state change to its database.

It also then will publish an event to the message broker for possibly other service boundaries to be aware and consume that event.

If you were to bypass the API, no event would be published. You CANNOT have another logical boundary, or even yourself as a developer go in and manually change state/data in a database without publishing an event.

Gotchas! in Event Driven Architecture

In the example above, the event published might be used to invalidate a cache. If data is changed manually or by some other boundary then the cache will not be invalidated and will be inconsistent.

You cannot bypass your API since it’s controlling making state changes and publishing the appropriate event. State changes and Events go hand in hand.

Failures

Failures in Event Driven Architecture can have serious implications on throughput, performance and result in cascading failures.

As an example, a consumer is processing a message and when doing so needs to make an HTTP call to an external service.

The issue arises when the external service becomes unavailable or has any abnormal latency to process the HTTP request.

Gotchas! in Event Driven Architecture

When the normal processing time of a message is 100ms, and because a timeout with the external service turns your processing time into 1 minute, this can cause backpressure if you’re producing more messages than you can consume.

There are many different strategies for handling failures such as immediate retries, backoffs, circuit breakers, poison messages, and dead letter queues.

Gotchas! in Event Driven Architecture

Handling failures when processing messages is a critical part that is constantly needing to be addressed. Not all situations are the same but it makes you evaluate how you handle failures and also understand how different consumers have different processing and latency requirements.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Learn more about Software Architecture & Design.
Join thousands of developers getting weekly updates to increase your understanding of software architecture and design concepts.


Leave a Reply

Your email address will not be published. Required fields are marked *