Sponsorship is available! If you’d like to sponsor CodeOpinion.com and have your product or service advertised exclusively (no AdSense) on every post, contact me.
What’s the Outbox Pattern? A reliable way of saving state to your database and publishing a message/event to a message broker.
Why do you need it? When you get into messaging, you will often find code that needs to save state to your database, and then publish a message/event to a message broker. Unfortunately, because they are two different resources, you need a distributed transaction to make these atomic.
There is another option to use the Outbox Pattern which does not require a distributed transaction and most messaging libraries support it.
Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.
To illustrate the issue, the first step is saving/changing the state in your primary database. It doesn’t matter if this is an RDMBS or a NoSQL Store.
Subsequently, since you have changed state, you need to now publish a message/event to a queue or message broker to let other systems know of the state change.
Since each step is independent, if there is a failure in publishing the event to the queue, then you’ve made a state change without letting other systems know of that change.
Why is this a problem? If you’re using messaging in an event-driven architecture, you likely rely on events being published. An event not being published could have all sorts of implications.
For example, if you’re using events to invalidate a cache, you’ll now have stale data. Or worst, if you’re using events apart of a Saga (long-running process), the next portion/step of the saga will possibly never occur.
You need atomicity. All or nothing.
The Outbox pattern solves this by using the transaction from your primary database to store your state changes along with the messages/events your publishing.
This means that messages your publishing are initially stored in the same database alongside your other application data. Each messaging library may implement this slightly differently in terms of the structure of the data and messages as well as how their API looks, but this is the overall idea.
The order of saving state or publishing an event doesn’t really matter anymore as they are saved to the database in the same transaction.
Once we commit our transaction, a secondary process/thread (Publisher) will pull the unpublished events from the primary database.
Then the Publisher will publish the events it pulled to the queue or message broker.
Finally, the Publisher will update/delete the records back to the database, so it knows that the events have been successfully published to the queue.
In this example, I’m using CAP, but as mentioned, different messaging libraries will have different ways they implement this in their API.
CAP’s is really straight forward. It provides an extension method on the Entity Framework Core DatabaseFacade to begin a database transaction. You simply pass along the ICapPublisher when starting the transaction. This tells CAP to save the published event to the database, rather than directly to the message broker, in my case RabbitMQ.
It’s really that straightforward. I’ve left out the configuration of CAP, which is also dead simple, it’s just a matter of using the correct data storage provider and specifying the configuration string in the ASP.NET Core Startup.ConfigureServices()
What this looks like in our primary database is a table that was created by CAP automatically called cap.Published. This is where it’s storing the published messages. Initially, the StatusName is null, and after successful publish to the queue, it updates it with Succeeded.
At Least Once Messaging
You may have noticed in the original diagrams, that Publisher that pulls from the database, publishes to the queue, then marks the event published in the database, ultimately has the same problem we started out with.
Really we just moved the problem but have a different outcome. If we publish the message to the queue, but for some reason, CAP cannot update the StatusName of the record, then we will ultimately publish the same event again.
In a lot of situations, this is a better problem to have. We are now in an “At Least Once” scenario. Meaning, events will get published once or possibly more. In the original scenario, we were in an “At Most Once”, which also implies only once or possibly none.
Learn how to create Idempotent Consumers to handle duplicate messages safely.