Event Sourcing vs Event Driven Architecture

Event Sourcing is seemingly constantly being confused with Event Driven Architecture. In this blog/video I’m going through a popular blog post that explains various points that are very valid, however, they are conflating Event Sourcing with Event Driven Architecture. Event Sourcing is about using events as the state. Event Driven Architecture is about using events to communicate between service boundaries.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Event Sourcing vs Event Driven Architecture

There is a blog post that keeps making its round over various news or social media sites every year or so that gets a lot of attention. The issue is it conflates Event Sourcing and Event Driven Architecture.

To be clear, Event Sourcing is about using events to represent state. In Event Driven Architecture, events are used to communicate with other service boundaries.

Event Sourcing

Event Sourcing is a different approach to storing data. Instead of storing the current state, you’re instead going to be storing events. Events represent the state transitions of things that have occurred in your system. If you want more details on exactly what Event Sourcing is, check out another post I’ve written Event Sourcing Example & Explained in plain English

Event Sourcing vs Event Driven Architecture

Event Driven Architecture

Event Driven Architecture is about using events as a way to communicate with other service boundaries. Generally, leveraging a message broker (or event log) to use the Publish/Subscriber pattern. Publish events and consume events asynchronously within other boundaries. When publishing an event, there may be zero or many consumers. The publisher is unaware of who is consuming an event. Consumers are unaware of each other. Event Driven Architecture is a way of loose coupling between service boundaries.

Event Sourcing vs Event Driven Architecture

Purely based on these two definitions, you might already start to see why Event Sourcing vs Event Driven Architecture isn’t even a valid comparison.

Counterpoints

The blog post illustrates having multiple services both publish and consume events from an event log (I’m going to assume Kafka).

The idea of a keeping a central log against which multiple services can subscribe and publish is insane. You wouldn’t let two separate services reach directly into each other’s data storage when not event sourcing

Yes, that’s insane. Don’t do that. If you’re using the event log for communication between service boundaries AND to represent state within boundaries, that’s a terrible idea. But that’s has nothing to do with Event Sourcing but rather mixing two concepts: state persistence and communication.

Under “normal” development flows, you operate within the safe, cozy little walls which make up your service. You’re free to make choices about implementation and storage and then, when you’re ready, deal with how those things get exposed to the outside world.

Exactly! This is exactly what you should be doing. A service boundary owns its data and defines how it exposes it. If you’re conflating Event Sourcing (state) with Event Event Architecture (communication) you can see how this would violate that.

For one, you’re probably going to be building the core components from scratch. Frameworks in this area tend to be heavy weight, overly prescriptive, and inflexible in terms of tech stacks.

Now it depends on which side of the question they were referring to. If we’re talking about Event Sourcing, there really isn’t any “framework” required. For example, if you’re using EventStoreDB, you can use their SDK directly. In many of the examples I use, I’m creating a Repository that is reading from an event stream and replaying the events to build up an Aggregate Root to the current state. From there the Aggregate Root generates new events and those are persisted back to the Event Stream from the Repository. There isn’t any framework or library code required other than the SDK Client from the Event Store.

With Event Driven Architecture, you absolutely want to use a library for messaging. I say this because there has a lot of patterns and concepts that come with using Event Driven Architecture. Messaging libraries provide these patterns and concepts for you so that you don’t have to implement them yourself. In the .NET space, libraries like NServiceBus, MassTransit, Brighter all come to mind that handles things like the Outbox Pattern, Process Managers, Fault Tolerance, Retries, Dead Letter Queues, and more.

If you have a UI, it generally needs to play along with the event driven aspect of the back end. Meaning, it should be task based.

Absolutely. Task-Based UIs will guide the end-user to perform specific tasks usually in some type of workflow. Tasks (or actions, commands) are explicit and allow you to then derive what intend to do where you can then generate the appropriate event from that task. CRUD isn’t explicit. You do knot know the intent of the end-user when you provide them with a CRUD-based UI. For more on task-based UIs, check out my post Decomposing CRUD to a Task Based UI.

A super common piece of advice in the ES world is that you don’t event source everywhere *. This is all well and good at the conceptual level, but actually figuring out where and when to draw those architectural boundaries through your system is quite tough in practice.

Defining boundaries is one of the most important things to do when developing and designing a system. Yet it’s one of the hardest things to do. In my experience, the real core of your solution space is where the complexity lies. On the outer edges, there are boundaries that are often in a supporting role. These supporting boundaries may either be very generic and can be something you buy off the shelf and integrate with, while others you may want to develop but can simply be CRUD. Each boundary defines how it persists state best on the requirements and what may fit best. Some boundaries might be best suited for a relational database, others a document database, or some an Event Store. But again that is about persistence and state. Each boundary decides its persistence. For communication, you can still leverage an Event Driven Architecture even though some boundaries are using a relational database. Again, don’t conflate needing to do Event Sourcing for persisting state in order to communicate via events.

We made it about a month before a shift in focus caused us to hit our first “oh, so these events are no longer relevant, at all?” situation. Once you hit this point, you’ve got a decision to make: what to do with the irrelevant / wrong / outdated events.

With Event Sourcing, the concern about versioning events is done within a single boundary that owns and uses those events for state. There are different strategies for versioning with event sourcing and how you want to handle “old” events.

One thing I’ve noticed however is that “no longer relevant” doesn’t actually happen all that often. If you’re using established business concepts, they don’t often become irrelevant. Usually, events that become irrelevant are because the developers defined the events and are generally based more on technical concerns rather than business concepts.

Once your data grows to the point where you can no longer materialize from the ledger in a reasonable amount of time, you’ll be forced to offload the reads to your materialized projections. And with this step comes materialization lag and the loss of read-after-write consistency.

This is a valid concern when you’re using Event Sourcing and creating Projections (a read model) that is generated asynchronously. There are various strategies to handle this that I’ve talked about in a video about Event Consistency. However, to point out, this isn’t specifically about Event Sourcing Projections but any type of system where you don’t have a read-after-write consistency. This can include using a database that is eventually consistent and has replication lag.

Event Sourcing vs Event Driven Architecture

Hopefully, this clarifies the differences of Event Sourcing vs Event Driven Architecture. While the original blog post has some valid points, the issue is it’s pointing those problems at Event Sourcing, when in many cases it’s because it’s conflating the two.

Event Sourcing is about using events as state. Event Driven Architecture is about using events to communicate. That’s not to say that Event Sourcing or Event Driven Architecture don’t have their difficulties, they do. However, if you treat them for what they are, you eliminate a whole set of problems the original post had because you’re not conflating the two. They are completely orthogonal from each other.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Handling Failures in Message Driven Architecture

Many great libraries help to add resilience and fault tolerance by handling failures in a message driven architecture. However, it’s not just as simple as adding retries, timeouts, circuit breakers, etc., globally to all network calls. Many implications are specific to the context of the request being processed. And in many cases, it’s not solely a technical concern but rather it’s a business concern.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Handling Failures

Transient Faults are happening randomly. They are failures that happen at unpredictable times and could be caused by a network issue, availability or latency with the service you’re trying to communicate with. Generally, these aren’t failures you run into very often but when they do happen how do you handle them in a message driven architecture when processing messages from using queues and topics?

The first question isn’t a technical one but rather if there is a failure apart of a long-running business process, what does the business say? What kind of impact would it have if there was such a transient failure? This isn’t just a technical decision of adding a retry and hoping for the best but rather is likely more of a business decision.

But because we’re developers, we love the technical aspect, so let’s jump into some technical concerns, and then you’ll see how this tails back into how it affects the business.

Immediate Retries

If you have a transient failure, the most common solution is to add an immediate retry. For example, if we’re processing a message in a consumer and it needs to make a synchronous call to an external service. This could be a database or some 3rd party service.

Handling Failures in Message Driven Architecture

When we immediately retry, if it was a transient failure, then we make the request again and possibly the request succeeds.

Handling Failures in Message Driven Architecture

This may add a bit of latency to processing our message since we had to make two calls to the external service, assuming the first failure happened immediately.

Exponential Backoff

Immediate retries don’t always solve the problem. If you notice that your immediate retry is also failing then you may want to implement an exponential backoff. This means that after every failed retry, we wait for a period of time and try again. If the failure occurs again, we wait even longer before retrying.

The implication now is your adding more and more latency to the processing of a message. If you have an expectation about how long it takes to process a message, adding an exponential backoff could have a negative impact on the overall throughput.

Another negative implication, depending on the broker and/or messaging library your using, you could be blocking that consumer from processing any other messages if you’re using Competing Consumers Pattern for Scalability with a message driven architecture.

Jumping back to the business, does processing the message need to succeed? As developers, we often think that everything must succeed. But the best answer might be to fail fast. That might actually be the best option. Context matters and talking with the business about failures and if they can or cannot happen is important.

An example of failing quickly is a better option is if you have a recurring message that gets processed every 5 minutes. If you have an exponential backoff and that adds 2 minutes of total processing time and it still may fail overall, when in 3 minutes, you’re going to try to process another message to try the same thing again. In this situation, it may be better to fail immediately to free up your consumer to process other messages.

Dead Letter

If we fail to process a message but don’t want to abandon it, we can use a dead-letter queue to store failed messages which is common in a message driven architecture. For example, if we have an immediate try with exponential backoff, and the external service is still unavailable then we can then send the message to a dead letter queue.

Handling Failures in Message Driven Architecture

We can monitor this queue for reporting and manually try and reprocess these messages later once we know the external service is available.

Circuit Breaker

Once you have a failure, do you want to keep trying all new messages that are being processed? If the external service is unavailable and you don’t want to have every message that’s being processed go through its defined exponential backoff, then you can use a circuit breaker.

For example, this allows you to immediately send the message to the dead letter queue instead of even trying to call the external service.

Handling Failures in Message Driven Architecture

After a timeout period once processing another message, the consumer would try and call the external service again, going through its exponential backoff if a failure.

Handling Failures in Message Driven Architecture

From a technical perspective, there’s a lot to think about when trying to add resilience when processing messages. Do you want to immediately retry? Should you rather fail immediately? Should you move a message to the dead letter queue? Do you want to have an exponential backoff? IF you do, can you tolerate the latency it might cause in processing, which will decrease throughput?

Be aware if you’re increasing processing times from an exponential backoff, this may also backing up your queue if you’re receiving more messages than you’re consuming. You might create a bottleneck. That may be totally fine, that’s the point of a queue, however, if you have an SLA about processing times it may have a serious impact.

Sometimes the right answer is failing fast. You don’t need retries if it doesn’t impact the business. If it does impact the business, maybe you fail fast and move to a dead letter queue. Maybe the process must succeed and it’s low enough volume where you can tolerate a long exponential backoff. The only way to have these answers is to talk to the business. These are not just technical decisions because they will impact the business.

Related Posts

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Message-IDs for Handling Concurrency

This post serves as a guide for how you can use a Message identification (Message-IDs) on your messages (events and commands) to handle concurrency.

This post is in a series related to messaging. The overview is available in my Message Properties post.

YouTube

If you haven’t already check out my YouTube channel.

Message-IDs

Each message, regardless of it being an event or a command, should contain a way to identify its specific instance of that message. This is as simple as adding a GUID/UUID to your messages:

No other message (event/command) should ever use this ID (unless you’re also using a message owner). It’s a one time only usage that should be unique.

The producer of the event should be creating this ID before it publishes it. It shouldn’t be hydrated by some middle party.

Concurrency

Most systems support at least once messaging. Meaning, they will deliver a message to the consumer at least once. This means it can be delivered more than once.

Message handlers should be reentrant. You should be able to invoke multiple instances of a message handler concurrently and safely.

Meaning if the exact same instance of an InventoryAdjusted event was invoked twice (with the same ID), at the same time, the outcome would be the inventory on hand would increase by 10, not 20.

We can achieve this by having our message handler use the message ID apart of the same transaction it’s using apart of its state change.

Here’s an example using pseudo C#

The example above is using a unique key on Handler and MessageID columns in the Concurrency table. When we either try to insert the record or commit the transaction, it will fail if the unique constraint fails.

Events

Since an event can have multiple consumers/handlers is the reason why I’ve included the name of the handler as apart of the unique constraint. This is to demonstrate if you were using the same concurrency table for multiple handlers. In the case of event handlers, you want each handler to process the event, but not more than once each.

If you were using this for commands, you wouldn’t need to record the handler, just the MessageID.

Related Links

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design