Gotchas! in Event Driven Architecture

Event Driven Architecture has a few gotchas. Things that you always need to be aware of or think about. Regardless of whether you’re new to Event Driven Architecture or a seasoned veteran, these just don’t go away. Let me cover 4 aspects of event driven architecture that you need to think about.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

At Least Once Delivery

You’ll likely use a broker that supports “at least once delivery”. This means that a message can be delivered to a consumer at least once. Meaning it can happen once or more. Because of this, consumers need to be able to handle getting a duplicate message, or a message they’ve already consumed.

There are different reasons why this can occur, and the first is because of the message broker.

When a message broker delivers a message to the consumer, the consumer needs to process the message and then acknowledge back to the broker that it consumed it successfully.

Gotchas! in Event Driven Architecture

If the consumer fails to acknowledge to the broker, say because an internal exception occurred when processing the message, then the message broker after a given period of time (invisibility timeout) will re-deliver that message to the consumer. This could also happen if the length of time it takes to consume a message is greater than the invisibility timeout. If the consumer does not acknowledge before the timeout, it will get re-delivered, regardless of why.

The second reason why this can happen is because of the publisher of the message. If the publisher is using the Outbox Pattern, this could cause them to send the same message more than once. However, the publisher might also send the same message in different cases such as sending a different message with the same values. In either case, if you’re going to be negatively impacted by processing a duplicate message, then you need to make your consumers idempotent.

Consistent Publishing

Events become first-class within Event Driven Architecture in your system, you need to be aware of when and where you’re publishing them. Often times events are published to indicate something has occurred such as a state change. This means that your code needs to be consistent that if a certain action has occurred, you’re publishing the appropriate event. If you make a state change and do not publish the event, this can have a negative impact on other parts of your system that are expecting it.

As an example, here’s a transaction script that is related to a food delivery app. When the food delivery driver arrives at the restaurant to pick up the food, the is this first action they perform, the “Arrive”.

One of the things we are doing along with making some state changes is publishing an Arrived event.

Now the second action the delivery driver will perform is called the “Pickup” after they have arrived at the restaurant. Once they “pick up” the food, this is the action that is performed.

Some of the logic we have however is that if they haven’t yet done the Arrive action first, we’ll just let them do the Pickup. The problem is that we’re making the state change for “Arrive”, yet not publishing the Arrived event.

The gotcha here is that Events are now first class. When certain state changes occur, you might need to publish an event for other service boundaries. Making this simple state change without publishing the Arrived event could have a negative impact on other service boundaries. You want to narrow down and encapsulate state change sand publishing events into something like an Aggregate.

Bypassing API

You cannot bypass your API. Your API is what is responsible for publishing events when certain state changes or behaviors are invoked. Event Driven Architecture makes Events first class and needs to be reliable and consistently published.

As an example, the dotted outlined square is a logical service boundary. When a client makes a request to the App Service to perform some action, it makes some state change to its database.

It also then will publish an event to the message broker for possibly other service boundaries to be aware and consume that event.

If you were to bypass the API, no event would be published. You CANNOT have another logical boundary, or even yourself as a developer go in and manually change state/data in a database without publishing an event.

Gotchas! in Event Driven Architecture

In the example above, the event published might be used to invalidate a cache. If data is changed manually or by some other boundary then the cache will not be invalidated and will be inconsistent.

You cannot bypass your API since it’s controlling making state changes and publishing the appropriate event. State changes and Events go hand in hand.

Failures

Failures in Event Driven Architecture can have serious implications on throughput, performance and result in cascading failures.

As an example, a consumer is processing a message and when doing so needs to make an HTTP call to an external service.

The issue arises when the external service becomes unavailable or has any abnormal latency to process the HTTP request.

Gotchas! in Event Driven Architecture

When the normal processing time of a message is 100ms, and because a timeout with the external service turns your processing time into 1 minute, this can cause backpressure if you’re producing more messages than you can consume.

There are many different strategies for handling failures such as immediate retries, backoffs, circuit breakers, poison messages, and dead letter queues.

Gotchas! in Event Driven Architecture

Handling failures when processing messages is a critical part that is constantly needing to be addressed. Not all situations are the same but it makes you evaluate how you handle failures and also understand how different consumers have different processing and latency requirements.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Semantic Versioning is overrated

Package Managers (npm, nuget, pip, composer, maven, etc) are an incredible tool for leveraging other libraries and frameworks into your products or projects. What seems to have been largely lost in the decision-making process on if you should take on a dependency and what the implications are. Once you do take a dependency, how much do you trust the vendor/author and semantic versioning? Before you do a quick npm install or update let me explain some things you should be considering.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Ownership

The theme of this post is really to emphasize that dependencies are your problem and you need to take ownership over the decision-making process of taking on a dependency.

Dependencies are nothing more than an extension of your own code. They require some serious consideration on what dependencies you’re taking on, why you’re taking the dependency, and should even be taking the dependency, to begin with.

Trust

Trust plays a role in taking on a dependency. If you’re working on a product that is going to be long-lived, you want to align yourself with dependencies that you’ll grow with. There’s a certain level of stability that you’ll want dependencies to have.

Meaning if you develop a product that lasts 10+ years, what dependencies will you take that will also live that long. These can either drag you down over time like an anchor or propel you forward with their own innovations.

As an example, I started a product in 2015 with .NET Framework 4.7 and ASP.NET just as .NET Core (now .NET5+) was emerging but not yet released. While the effort to move to ASP.NET Core took time and knowledge it was possible and was completed. We’ve kept up to date with .NET as it stands today and has been propelled forward by all of the advancements rather than being dragged down and being stuck on the .NET Framework.

Semantic Versioning

So where does Semantic Versioning fit into any of this that I’ve written about so far? Well, Semantic Versioning is nothing more than a way to communicate that something changed in a package that you depend on. That’s it.

While the intent is to identify the degree of change with major, minor, and patch versions, that’s only as good as the author’s intent of following that definition. For example, if the patch version changes because of a bug fix that is done in a backward-compatible manner, what’s your level of trust that it actually still is backward-compatible? Just because the API surface didn’t change doesn’t mean it’s backward-compatible. What happens if you have a work-around for a bug, that then is fixed. That’s not backward-compatible to you if it breaks you now that it’s fixed.

Semantic Versioning tells you something changed. That’s it.

Test

Even if the authors of a package have good intent and understanding of SemVer, that doesn’t mean what should be a non-breaking change won’t be breaking to you. The only way to satisfy this is to have the proper level of testing that validates your usage of a dependency.

You only need to concern yourself with the functionality and API surface that you actually use from that dependency.

Pin Versions & Upgrades

If you don’t have good test coverage or the ability to understand upgrades of the package, pin your versions. Don’t rely on ranges but rather have a cadence for manually dealing with updating packages.

The longer you wait to upgrade to a new version the more you’ll have to either change in the long run. If you don’t want to get on the upgrade train, don’t pick or bet on a dependency that’s constantly releasing. This could be because of feature enhancements that you don’t need or bug fixes due to instability.

It goes back to ownership and making a decision based on your context.

Context is King

If you’re working on a project or product that is going to be short-lived, almost nothing I’m talking about matters. You’ll likely not be even bothering to update any dependencies unless there’s some security issue you need to address.

However, if the project/product is going to be long-lived over many years, you’ll likely be constantly evolving the system. In doing so you’re going to likely need to update dependencies. The more dependencies you have the more work it is to manage them.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Event Sourcing vs Event Driven Architecture

Event Sourcing is seemingly constantly being confused with Event Driven Architecture. In this blog/video I’m going through a popular blog post that explains various points that are very valid, however, they are conflating Event Sourcing with Event Driven Architecture. Event Sourcing is about using events as the state. Event Driven Architecture is about using events to communicate between service boundaries.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Event Sourcing vs Event Driven Architecture

There is a blog post that keeps making its round over various news or social media sites every year or so that gets a lot of attention. The issue is it conflates Event Sourcing and Event Driven Architecture.

To be clear, Event Sourcing is about using events to represent state. In Event Driven Architecture, events are used to communicate with other service boundaries.

Event Sourcing

Event Sourcing is a different approach to storing data. Instead of storing the current state, you’re instead going to be storing events. Events represent the state transitions of things that have occurred in your system. If you want more details on exactly what Event Sourcing is, check out another post I’ve written Event Sourcing Example & Explained in plain English

Event Sourcing vs Event Driven Architecture

Event Driven Architecture

Event Driven Architecture is about using events as a way to communicate with other service boundaries. Generally, leveraging a message broker (or event log) to use the Publish/Subscriber pattern. Publish events and consume events asynchronously within other boundaries. When publishing an event, there may be zero or many consumers. The publisher is unaware of who is consuming an event. Consumers are unaware of each other. Event Driven Architecture is a way of loose coupling between service boundaries.

Event Sourcing vs Event Driven Architecture

Purely based on these two definitions, you might already start to see why Event Sourcing vs Event Driven Architecture isn’t even a valid comparison.

Counterpoints

The blog post illustrates having multiple services both publish and consume events from an event log (I’m going to assume Kafka).

The idea of a keeping a central log against which multiple services can subscribe and publish is insane. You wouldn’t let two separate services reach directly into each other’s data storage when not event sourcing

Yes, that’s insane. Don’t do that. If you’re using the event log for communication between service boundaries AND to represent state within boundaries, that’s a terrible idea. But that’s has nothing to do with Event Sourcing but rather mixing two concepts: state persistence and communication.

Under “normal” development flows, you operate within the safe, cozy little walls which make up your service. You’re free to make choices about implementation and storage and then, when you’re ready, deal with how those things get exposed to the outside world.

Exactly! This is exactly what you should be doing. A service boundary owns its data and defines how it exposes it. If you’re conflating Event Sourcing (state) with Event Event Architecture (communication) you can see how this would violate that.

For one, you’re probably going to be building the core components from scratch. Frameworks in this area tend to be heavy weight, overly prescriptive, and inflexible in terms of tech stacks.

Now it depends on which side of the question they were referring to. If we’re talking about Event Sourcing, there really isn’t any “framework” required. For example, if you’re using EventStoreDB, you can use their SDK directly. In many of the examples I use, I’m creating a Repository that is reading from an event stream and replaying the events to build up an Aggregate Root to the current state. From there the Aggregate Root generates new events and those are persisted back to the Event Stream from the Repository. There isn’t any framework or library code required other than the SDK Client from the Event Store.

With Event Driven Architecture, you absolutely want to use a library for messaging. I say this because there has a lot of patterns and concepts that come with using Event Driven Architecture. Messaging libraries provide these patterns and concepts for you so that you don’t have to implement them yourself. In the .NET space, libraries like NServiceBus, MassTransit, Brighter all come to mind that handles things like the Outbox Pattern, Process Managers, Fault Tolerance, Retries, Dead Letter Queues, and more.

If you have a UI, it generally needs to play along with the event driven aspect of the back end. Meaning, it should be task based.

Absolutely. Task-Based UIs will guide the end-user to perform specific tasks usually in some type of workflow. Tasks (or actions, commands) are explicit and allow you to then derive what intend to do where you can then generate the appropriate event from that task. CRUD isn’t explicit. You do not know the intent of the end-user when you provide them with a CRUD-based UI. For more on task-based UIs, check out my post Decomposing CRUD to a Task Based UI.

A super common piece of advice in the ES world is that you don’t event source everywhere *. This is all well and good at the conceptual level, but actually figuring out where and when to draw those architectural boundaries through your system is quite tough in practice.

Defining boundaries is one of the most important things to do when developing and designing a system. Yet it’s one of the hardest things to do. In my experience, the real core of your solution space is where the complexity lies. On the outer edges, there are boundaries that are often in a supporting role. These supporting boundaries may either be very generic and can be something you buy off the shelf and integrate with, while others you may want to develop but can simply be CRUD. Each boundary defines how it persists state best on the requirements and what may fit best. Some boundaries might be best suited for a relational database, others a document database, or some an Event Store. But again that is about persistence and state. Each boundary decides its persistence. For communication, you can still leverage an Event Driven Architecture even though some boundaries are using a relational database. Again, don’t conflate needing to do Event Sourcing for persisting state in order to communicate via events.

We made it about a month before a shift in focus caused us to hit our first “oh, so these events are no longer relevant, at all?” situation. Once you hit this point, you’ve got a decision to make: what to do with the irrelevant / wrong / outdated events.

With Event Sourcing, the concern about versioning events is done within a single boundary that owns and uses those events for state. There are different strategies for versioning with event sourcing and how you want to handle “old” events.

One thing I’ve noticed however is that “no longer relevant” doesn’t actually happen all that often. If you’re using established business concepts, they don’t often become irrelevant. Usually, events that become irrelevant are because the developers defined the events and are generally based more on technical concerns rather than business concepts.

Once your data grows to the point where you can no longer materialize from the ledger in a reasonable amount of time, you’ll be forced to offload the reads to your materialized projections. And with this step comes materialization lag and the loss of read-after-write consistency.

This is a valid concern when you’re using Event Sourcing and creating Projections (a read model) that is generated asynchronously. There are various strategies to handle this that I’ve talked about in a video about Event Consistency. However, to point out, this isn’t specifically about Event Sourcing Projections but any type of system where you don’t have a read-after-write consistency. This can include using a database that is eventually consistent and has replication lag.

Event Sourcing vs Event Driven Architecture

Hopefully, this clarifies the differences of Event Sourcing vs Event Driven Architecture. While the original blog post has some valid points, the issue is it’s pointing those problems at Event Sourcing, when in many cases it’s because it’s conflating the two.

Event Sourcing is about using events as state. Event Driven Architecture is about using events to communicate. That’s not to say that Event Sourcing or Event Driven Architecture don’t have their difficulties, they do. However, if you treat them for what they are, you eliminate a whole set of problems the original post had because you’re not conflating the two. They are completely orthogonal from each other.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design