Avoiding Distributed Transactions with the Reservation Pattern

A long-running business process could last seconds to days, you cannot lock resources within a service using a distributed transaction. So what’s the alternative? The real world has a solution, it’s a reservation. The reservation pattern allows you to have a time-bound limited guarantee which allows you to coordinate with other services.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Distributed Transaction

When working with a traditional monolith, you can use a database transaction. Wrap all the relevant database calls within a transaction, and if anything goes wrong executing all the steps in a business process, you simply rollback.

When working with a system that has many distributed services, this isn’t an option, unless you have a distributed transaction. Which you likely won’t. So what’s the alternative? One solution is to use the reservation pattern that allows you to create a time-bound lock on a single resource that allows you to execute your long-running business process and have a limited guarantee the resource will be available when required.

A common example of this is requiring a unique constraint on a username or email address. If you have multiple services that are a part of a user registration process, not all of them will be consistent in that constraint. Likely one service can own that constraint, however, if another user is doing the sign-up concurrently, we could have duplicates in other services.

Real-World Example

To better illustrate this, let’s turn to the real world where we use the reservation pattern in many different scenarios.

I recently made an online order for a product at a big box store. This online order was for pickup, not delivery. When I placed the order, the website said there was only one item available at my local store.

Once I placed my order, this is the email I received:

Avoiding Distributed Transactions with the Reservation Pattern

Notice at the very top, they state that I’ll receive another email once the item that I purchased is ready for pickup.

This is because once the order is sent to my local store, an employee has to physically go get the item off the shelf so nobody else buys it. Because there was only one item remaining (or so their website said), it may be also possible that information is stale and there is no item available.

The employee going to get the item off the shelf is reserving the item for me.

Once they found the item on the shelf, they brought it to a designated area in the store for pick-up orders. They also marked my order as being ready for pickup, which triggered this email

Avoiding Distributed Transactions with the Reservation Pattern

One important aspect of a reservation is it’s time-limited. You’ll notice in the email states that if I don’t pick up the order within 7 days, they will refund my credit card and put the item back on the shelf.

They are providing a lock/hold on my item and that lock will expire in 7 days.

When I go to the local store and pick up my order, this is confirming my reservation.

Reservation Pattern

We can implement the reservation pattern in code to solve similar types of problems. In the example of user registration, we can create a reservation on a username at the beginning of the registration process. Once the user completes the user registration process, we can confirm the reservation.

If the user never completes the registration process, the username will expire after the time we define, letting someone else try and register the same username.

The reservation pattern has 3 important aspects. Reserving, Confirming, Expiring.

To first illustrate this in code, I’m going to show the synchronous version first, followed by the asynchronous version, which is generally more applicable.

The user registration first checks the reservation of the username exists. If it doesn’t it will create a new account, save it to our database, then tell the reservation it’s complete.

The reservation itself has the ability to Reserve, Complete. Internally after 5 seconds, it will Expire the username if it’s still reserved. This is a sample, not using a real database nor handling concurrency, as you likely would in a real production environment.

Now to illustrate the asynchronous version, I’m going to be using NServiceBus Saga to handle all the interactions and coordinate the reservation. This is closer to the real world in an application where there are likely many different steps involved in a long-running business process.

The process is kicked off when the UserRegistrationStarted event is consumed. From there it will send a ReserveUsername command, which will handle the actual reserving the username.

There are two things to point out. First is that we are sending an ExpireReservation command that will get delivered in 10 seconds. This is the expiry to remove the username from the reservation if it’s still there. The second is we now publish a UsernameReserved event which is also handled UserRegistration. The UserRegistration will then send a CreateUserAccount command. When that command is executed (code not shown) it will publish a UserAccountCreated event, which is again handled by UserRegistration, which can then send a ConfirmUsernameReservation to complete the reservation process.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Reservation Pattern

The reservation pattern is used in the real world all the time. We can leverage the pattern in code to provide time-bound limited guarantees between services for a long-running business process. It’s a limited lock that can expire.

Like my real-world example, I placed an online order for pickup. The item was reserved for me which prevented me from wasting my time by going to the store and the item not being available. When the employee took the item off the shelve, that reservation had 7 days for me to pick up the item, or that reservation would expire and someone else could purchase it.

Sometimes you just need to look to the real world for a solution.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Building a Webhooks System with Event Driven Architecture

Do you need to integrate with external systems or services? Leveraging an event driven architecture enables you to build a webhooks system that can be decoupled from your main application code. Enabling you to call external systems that have subscribed via webhooks in complete isolation from your application code.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.


In-Process

The simplest approach to building a webhooks system is to make calls in-process to the external HTTP APIs that want to be notified when we actually make some type of state change within our system.

For example, we have an e-commerce application where a user places an order. When this occurs, the application must persist the order data to our database.

Once the order is saved to our database, we then make an HTTP call to the 3rd part external HTTP API.

There are a few issues with this simplistic approach. The first is that because it’s in-process, the latency added to calling the external HTTP is added to the overall request the client made to place the order. Meaning this is blocking the client from getting their result of trying to place their order.

If the external HTTP API accepts our HTTP request but takes a long time to handle it, this could have a very negative impact on our own client. Do we have a timeout? What happens if the external HTTP API is unavailable and we can’t connect? Do we retry? All of this is adding latency to the overall call from our client.

Ideally, we separate placing an order with the integration (webhooks) to the external HTTP APIs.

Publish-Subscribe

We can leverage event driven architecture to decouple the concerns of saving the order and doing our webhooks integrations.

After the application (producer) has saved the order to the database, it then publishes an OrderPlaced event to a topic on our message broker. At this point, the Application can return back to the client.

Building a Webhooks System with Event Driven Architecture

Our webhook system can be a separate logical and/or physical component that is a consumer of that OrderPlaced Event by subscribing to the Topic on the message broker.

Building a Webhooks System with Event Driven Architecture

When it consumes the OrderPlaced event, it can then make the HTTP call to the external HTTP APIs.

Building a Webhooks System with Event Driven Architecture

Now if there is latency with one of the external HTTP APIs that doesn’t affect placing the order from our client. That’s already done and now completely separated since we’ve moved to asynchronous messaging.

Example

An example of this is done in the eShopOnContainers sample application. It does exactly have I have it outlined above.

It has a consumer that handles the OrderStatusChangedToPaidIntegrationEvent. When this event is consumed, it gets all the webhook subscriptions and then sends all the HTTP requests out.

Below is the implementation of the IWebhooksSender which makes the HTTP calls to the external HTTP APIs and sends them the webhook data.

More isolation

There’s one issue with the eShopOnContainers example above.

There likely would be multiple webhooks subscriptions that we would need to handle. The issue with this now is that if we do all HTTP requests in the same process, that means that each HTTP request will add latency to the overall processing of the event. What happens if one of the HTTP APIs fails to connect? What happens if one takes a really long time complete? We’re in a similar situation as before where we want to isolate work into a very specific task.

Building a Webhooks System with Event Driven Architecture

To accomplish this, when we receive the OrderPlaced event, we can look at all the webhook subscriptions we have, create a command, and send it to a queue on our message broker.

As an example, if there were three different webhook subscriptions, we would create three commands, SendOrderPlacedWebhookCommand.

Each command would then also be processed by our webhook system asynchronously where it would pull each message off the queue and then send the HTTP request to the appropriate external HTTP API.

After it finished processing the first command, it would then pick up the second command and perform the HTTP call.

What this now does is separate each individual webhook into its own unit of execution.

Now if one HTTP call fails or takes a long time, it is totally independent of any other.

Webhooks

Using an event driven architecture and messaging can facilitate building a webhooks system that can be very robust, fault-tolerant, resilient, and decoupled from your primary application code.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Gotchas! in Event Driven Architecture

Event Driven Architecture has a few gotchas. Things that you always need to be aware of or think about. Regardless of whether you’re new to Event Driven Architecture or a seasoned veteran, these just don’t go away. Let me cover 4 aspects of event driven architecture that you need to think about.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

At Least Once Delivery

You’ll likely use a broker that supports “at least once delivery”. This means that a message can be delivered to a consumer at least once. Meaning it can happen once or more. Because of this, consumers need to be able to handle getting a duplicate message, or a message they’ve already consumed.

There are different reasons why this can occur, and the first is because of the message broker.

When a message broker delivers a message to the consumer, the consumer needs to process the message and then acknowledge back to the broker that it consumed it successfully.

Gotchas! in Event Driven Architecture

If the consumer fails to acknowledge to the broker, say because an internal exception occurred when processing the message, then the message broker after a given period of time (invisibility timeout) will re-deliver that message to the consumer. This could also happen if the length of time it takes to consume a message is greater than the invisibility timeout. If the consumer does not acknowledge before the timeout, it will get re-delivered, regardless of why.

The second reason why this can happen is because of the publisher of the message. If the publisher is using the Outbox Pattern, this could cause them to send the same message more than once. However, the publisher might also send the same message in different cases such as sending a different message with the same values. In either case, if you’re going to be negatively impacted by processing a duplicate message, then you need to make your consumers idempotent.

Consistent Publishing

Events become first-class within Event Driven Architecture in your system, you need to be aware of when and where you’re publishing them. Often times events are published to indicate something has occurred such as a state change. This means that your code needs to be consistent that if a certain action has occurred, you’re publishing the appropriate event. If you make a state change and do not publish the event, this can have a negative impact on other parts of your system that are expecting it.

As an example, here’s a transaction script that is related to a food delivery app. When the food delivery driver arrives at the restaurant to pick up the food, the is this first action they perform, the “Arrive”.

One of the things we are doing along with making some state changes is publishing an Arrived event.

Now the second action the delivery driver will perform is called the “Pickup” after they have arrived at the restaurant. Once they “pick up” the food, this is the action that is performed.

Some of the logic we have however is that if they haven’t yet done the Arrive action first, we’ll just let them do the Pickup. The problem is that we’re making the state change for “Arrive”, yet not publishing the Arrived event.

The gotcha here is that Events are now first class. When certain state changes occur, you might need to publish an event for other service boundaries. Making this simple state change without publishing the Arrived event could have a negative impact on other service boundaries. You want to narrow down and encapsulate state change sand publishing events into something like an Aggregate.

Bypassing API

You cannot bypass your API. Your API is what is responsible for publishing events when certain state changes or behaviors are invoked. Event Driven Architecture makes Events first class and needs to be reliable and consistently published.

As an example, the dotted outlined square is a logical service boundary. When a client makes a request to the App Service to perform some action, it makes some state change to its database.

It also then will publish an event to the message broker for possibly other service boundaries to be aware and consume that event.

If you were to bypass the API, no event would be published. You CANNOT have another logical boundary, or even yourself as a developer go in and manually change state/data in a database without publishing an event.

Gotchas! in Event Driven Architecture

In the example above, the event published might be used to invalidate a cache. If data is changed manually or by some other boundary then the cache will not be invalidated and will be inconsistent.

You cannot bypass your API since it’s controlling making state changes and publishing the appropriate event. State changes and Events go hand in hand.

Failures

Failures in Event Driven Architecture can have serious implications on throughput, performance and result in cascading failures.

As an example, a consumer is processing a message and when doing so needs to make an HTTP call to an external service.

The issue arises when the external service becomes unavailable or has any abnormal latency to process the HTTP request.

Gotchas! in Event Driven Architecture

When the normal processing time of a message is 100ms, and because a timeout with the external service turns your processing time into 1 minute, this can cause backpressure if you’re producing more messages than you can consume.

There are many different strategies for handling failures such as immediate retries, backoffs, circuit breakers, poison messages, and dead letter queues.

Gotchas! in Event Driven Architecture

Handling failures when processing messages is a critical part that is constantly needing to be addressed. Not all situations are the same but it makes you evaluate how you handle failures and also understand how different consumers have different processing and latency requirements.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design