Handling Failures in Message Driven Architecture

Many great libraries help to add resilience and fault tolerance by handling failures in a message driven architecture. However, it’s not just as simple as adding retries, timeouts, circuit breakers, etc., globally to all network calls. Many implications are specific to the context of the request being processed. And in many cases, it’s not solely a technical concern but rather it’s a business concern.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Handling Failures

Transient Faults are happening randomly. They are failures that happen at unpredictable times and could be caused by a network issue, availability or latency with the service you’re trying to communicate with. Generally, these aren’t failures you run into very often but when they do happen how do you handle them in a message driven architecture when processing messages from using queues and topics?

The first question isn’t a technical one but rather if there is a failure apart of a long-running business process, what does the business say? What kind of impact would it have if there was such a transient failure? This isn’t just a technical decision of adding a retry and hoping for the best but rather is likely more of a business decision.

But because we’re developers, we love the technical aspect, so let’s jump into some technical concerns, and then you’ll see how this tails back into how it affects the business.

Immediate Retries

If you have a transient failure, the most common solution is to add an immediate retry. For example, if we’re processing a message in a consumer and it needs to make a synchronous call to an external service. This could be a database or some 3rd party service.

Handling Failures in Message Driven Architecture

When we immediately retry, if it was a transient failure, then we make the request again and possibly the request succeeds.

Handling Failures in Message Driven Architecture

This may add a bit of latency to processing our message since we had to make two calls to the external service, assuming the first failure happened immediately.

Exponential Backoff

Immediate retries don’t always solve the problem. If you notice that your immediate retry is also failing then you may want to implement an exponential backoff. This means that after every failed retry, we wait for a period of time and try again. If the failure occurs again, we wait even longer before retrying.

The implication now is your adding more and more latency to the processing of a message. If you have an expectation about how long it takes to process a message, adding an exponential backoff could have a negative impact on the overall throughput.

Another negative implication, depending on the broker and/or messaging library your using, you could be blocking that consumer from processing any other messages if you’re using Competing Consumers Pattern for Scalability with a message driven architecture.

Jumping back to the business, does processing the message need to succeed? As developers, we often think that everything must succeed. But the best answer might be to fail fast. That might actually be the best option. Context matters and talking with the business about failures and if they can or cannot happen is important.

An example of failing quickly is a better option is if you have a recurring message that gets processed every 5 minutes. If you have an exponential backoff and that adds 2 minutes of total processing time and it still may fail overall, when in 3 minutes, you’re going to try to process another message to try the same thing again. In this situation, it may be better to fail immediately to free up your consumer to process other messages.

Dead Letter

If we fail to process a message but don’t want to abandon it, we can use a dead-letter queue to store failed messages which is common in a message driven architecture. For example, if we have an immediate try with exponential backoff, and the external service is still unavailable then we can then send the message to a dead letter queue.

Handling Failures in Message Driven Architecture

We can monitor this queue for reporting and manually try and reprocess these messages later once we know the external service is available.

Circuit Breaker

Once you have a failure, do you want to keep trying all new messages that are being processed? If the external service is unavailable and you don’t want to have every message that’s being processed go through its defined exponential backoff, then you can use a circuit breaker.

For example, this allows you to immediately send the message to the dead letter queue instead of even trying to call the external service.

Handling Failures in Message Driven Architecture

After a timeout period once processing another message, the consumer would try and call the external service again, going through its exponential backoff if a failure.

Handling Failures in Message Driven Architecture

From a technical perspective, there’s a lot to think about when trying to add resilience when processing messages. Do you want to immediately retry? Should you rather fail immediately? Should you move a message to the dead letter queue? Do you want to have an exponential backoff? IF you do, can you tolerate the latency it might cause in processing, which will decrease throughput?

Be aware if you’re increasing processing times from an exponential backoff, this may also backing up your queue if you’re receiving more messages than you’re consuming. You might create a bottleneck. That may be totally fine, that’s the point of a queue, however, if you have an SLA about processing times it may have a serious impact.

Sometimes the right answer is failing fast. You don’t need retries if it doesn’t impact the business. If it does impact the business, maybe you fail fast and move to a dead letter queue. Maybe the process must succeed and it’s low enough volume where you can tolerate a long exponential backoff. The only way to have these answers is to talk to the business. These are not just technical decisions because they will impact the business.

Related Posts

Follow @CodeOpinion on Twitter

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.

Real-Time Web by leveraging Event Driven Architecture

Event Driven Architecture isn’t just for communicating between services or systems. It’s a characteristic of your architecture. You can develop a monolith and still use event driven architecture. Here’s an example of how using events can drive real-time web and make your clients more interactive.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Simple Event Processing

The basics of Event Driven Architecture are Events, Producers, and Consumers.

To start with a Producer will create an Event. An event is simply a message that identifies that something has occurred within the system.

Event Driven Architecture for Real-Time Web

Then the producer publishes that Event (message) to a Topic on a Message Broker.

Event Driven Architecture for Real-Time Web

Finally, Consumers subscribe to a Topic. This means that they will receive the Events (messages) that are Published to the Topic from the Producer.

The Producer has no idea who the Consumers are. It has no concept that there even are any Consumers. There may be zero Consumers for an Event or many Consumers. The Producer simply publishes Events to the Message Broker about what has occurred within the system. Consumers consume these events usually by performing some type of task. This is the publish/subscribe pattern of messaging.

For the example of this post, we can leverage this pattern to push events to a browser for real-time web.

Real-Time Web

In this example, I have an ASP.NET Core HTTP API that’s Product and Consumer. Meaning it will both Publish events and consume them as well. Once it consumes an event, it will use SignalR to communicate with Blazor Web Assembly apps. The reason our ASP.NET Core is a Product and Consumer is it allows us to decouple the concerns of using SignalR to communicate with Blazor WebAssembly.

I’m using and modifying the Blazor Pizza Sample App to illustrate this. First, our ASP.NET Core app will create a message when a Pizza Order is going through its state transitions. The transitions are Order is Placed, Preparing, Out for Delivery, and Delivered.

Event Driven Architecture for Real-Time Web

Then it publishes this event to our Message Broker.

Event Driven Architecture for Real-Time Web

Finally, since our ASP.NET Core is also the Consumer, it receives this Event and then uses SignalR to push a message to our connected Blazor WebAssembly Client Apps.

Event Driven Architecture for Real-Time Web

Source Code

Developer-level members of my CodeOpinion YouTube channel get access to the full source for any working demo application that I post on my blog or YouTube. Check out the membership for more info.

I’ve defined 3 events for the state transitions:

Our ASP.NET Core App will Publish these events. I’m using CAP in these examples along with Kafka as a message broker.

Each Controller Action for changing the state of the order will publish a associated event.

Since our ASP.NET Core app is also the consumer, we are creating a CAP Subscriber that will subscribe and consume all these events, and use SignalR to push down a message to our Blazor WebAssembly Client Apps.

Then from our Blazor WebAssembly App, we can subscribe to the SignalR messages and once received, update our Order Details Page.

Here’s a visual illustration of this working.

Real-Time Web & Event Driven Architecture

Using events to communicate between microservices or systems isn’t the only reason to leverage event driven architecture. You can also use it within a single application and be both the Producer and Consumer. This allows you to decouple the concerns and be reactive about events occurring within your system. In this example, I’m using events to drive real-time web, but hopefully, you can now see other use-cases.

Related Posts

Follow @CodeOpinion on Twitter

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.

STOP doing dogmatic Domain Driven Design

The mainstream thought on Domain Driven Design is about Entities, Value Objects, Aggregates, Repositories, Services, Factories… all kinds of technical patterns. Because of this, most don’t think they need Domain Driven Design because it’s complicated for their domain. Why would you need all that “stuff”? Well, maybe you don’t! In a large system, modeling your domain, defining boundaries, and how they relate is far more important than concerning yourself if you’re using the Repository pattern correctly.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Domain Driven Design

I often use terms that people associate with Domain Driven Design (DDD), however, I generally don’t call out Domain Driven Design explicitly in many of my videos or blog posts. You probably have noticed that I do talk a lot about boundaries. Boundaries are probably the most important aspect for me that came from Domain Driven Design. However, the vast majority of content or discussions you will find online about DDD revolve around tactical patterns: Entities, Value Objects, Repositories, Services, Aggregates, etc.

While the tactical patterns have value, they must come after understanding the concept and value of defining boundaries with a Bounded Context. While a Bounded Context does get a lot of attention from Domain Driven Design enthusiasts, it’s not what people are introduced to first in tutorials or “sample” applications.

Here are some of the comments and questions I see the most related to Domain Driven Design:

DDD is only powerful when your business logic is complex

While I agree complexity is a driving factor, I also think it’s helped me most when dealing with larger systems. Complexity from the Domain as well as the complexity that comes from a large system.

In DDD you’re supposed to use a repository pattern.

You need to use factories to create entities or value objects.

This is the wrong concern. Focusing on patterns rather than boundaries and modeling the actual domain.

You have no logic in your entities, so that’s an anemic domain model, which is an anti-pattern.

It’s only an anti-pattern if you think you have a domain model but you really have a data model with transaction scripts. There isn’t anything wrong with starting with that and moving to a richer domain model as your understanding of the problem evolves.

You can’t have dependencies in your Domain Model

People go to incredible lengths to avoid have dependencies in their domain model. While again it’s a good practice to avoid dependencies and worth striving for, sometimes you actually need a dependency. Using Double Dispatch isn’t bad because you’re passing a dependency to a method and someone told you all dependencies must be injected via the constructor.

Dogmatic Domain Driven Design

I think all the questions above are losing sight of the benefits of Domain Driven Design. I don’t think concerning yourself solely along with tactical patterns and if they are being applied correctly is applying DDD. I think that’s applying dogmatic DDD.

To me doing Domain Driven Design is about understanding your domain, the language, the context of different people within it, the problem, and the solution space and trying to model it. And you’re not going to get it right at first as it will take time to iteratively build upon the insights you gain.

There’s value in the tactical patterns like Aggregates (which are a consistency boundary), but that shouldn’t be the focus. Just because you have repositories, aggregates, entities, doesn’t mean you’re doing Domain Driven Design. You just have a bunch of patterns.

Language

I find language a great way to understand the boundaries and define a bounded context within a system. A great example of this was from Mel Conway on Twitter.

Depending on who you’re talking to in your Domain, they likely have a different context given the subdomain they are in or the role that they have.

Exactly as Mel is pointing out, context matters in how language and intent is used.

As an example, in a Transportation company that transports freight/goods, there are multiple subdomains. Recruitment is who is hiring drivers, making sure they have the proper license, compliances, etc. Operations are concerned about the actual shipments and the freight being picked up and delivered.

Both have the concept of a Vehicle. But both have very different concerns and views on what a Vehicle is. They share the same term “Vehicle” but based on their context have very different views of what matters to me.

STOP doing dogmatic Domain Driven Design

In recruitment, the concept of a Vehicle maybe is owned by a Driver (known as an Owner Operator). They might care about the Vehicle Safety requirements and other compliances. Operations care about the availability of the vehicle and if it can do a particular shipment at a given time. Very different concerns.

Discovery

Understanding a Domain, Subdomains, and developing a Bounded Context can be very challenging. I like to use the analogy of walking into a dark room with only a small flashlight

At first, you have no mental model of what the room actually looks like. As you slowly flash the light around the room, you get a better mental model of how high the ceilings are, what’s the shape of the room, what’s on the floor. Your understanding grows slowly as you shine the light.

Understanding the boundaries of a domain takes effort and is the key to being able to understand the problem and solution space. This is the key to domain modeling.

Domain Driven Design

The title says it all. It’s not pattern-driven design, it’s Domain Driven Design. Don’t get caught up in the dogmatic majority that is focused on the tactical patterns. Again, yes they are valuable. Aggregate Design is a great way to define consistency boundaries. I’m not discounting the patterns, but that’s not the focus. The patterns are a means to an end.

Source Code

Developer-level members of my CodeOpinion YouTube channel get access to the full source for any working demo application that I post on my blog or YouTube. Check out the membership for more info.

Related Posts

Follow @CodeOpinion on Twitter

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.