Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.
Building resilient software systems can be difficult, especially when they are distributed. Executing a long-running business process or workflow that involves multiple different service boundaries requires a lot of resiliency. Everything needs to be available and functioning because if something fails mid-way through, you can be left in an inconsistent state. What’s a solution? Workflow orchestration and removing direct service to service communication and temporal coupling.
YouTube
Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.
Workflow
When working with a monolith and a single database, all database interactions are wrapped in a single transaction. Things are pretty straightforward.
If there is any failure while executing, you simply roll back the transaction. If everything works correctly, the transaction is committed.
The problem arises when you have a set of distributed services involved in the workflow. You won’t likely have a distributed transaction, so there is no “rollback.” We can’t use the same mindset.
If there is service to service RPC calls, what happens when service D, far down the RPC call stack, fails? If any other services up the call stack did some type of state change or had some side-effect, how do they roll back? Each service needs to be aware of potential failures otherwise, they will be in an inconsistent state.
Using RPC for service-to-service communication can have many more issues, including latency. Check out my post REST APIs for Microservices? Beware!
Synchronous Orchestration
The first step is to remove the rat’s nets of service-to-service communication. This allows us to reason about a service; if it has any failures, they aren’t because of other services.
The first might be to create an orchestrator that handles all the interactions. Procedurally calling one service after the other.
Once the first call to the Order service is made, the client (orchestrator) makes the next call to the Payment service.
After the payment service returns, the client/orchestrator then makes a call to the Warehouse service. But what happens if the warehouse service call fails?
We’re in a similar situation where we need to back and call the payment service to try and revert or undo the previous call. In the case of the payment service, this may be to refund or void a transaction.
We haven’t accomplished much in making a synchronous orchestrator over service-to-service communication because, ultimately, the problem is that we are relying on every service to be available and operating without any failures for the entire workflow or business process.
We want to remove this temporal coupling by making each service independent, so they do not rely on each other and don’t need to be available simultaneously.
Asynchronous Workflow
You can accomplish asynchronous workflow orchestration by using messaging through commands and events allow us workflow to execute asynchronously. An orchestrator can consume events published by other services and then react to them by sending commands to the appropriate boundary to continue the workflow. The orchestrator directs the workflow.
To illustrate, when the initial Client requests the Ordering Service, it will make a state change to create an Order.
Ordering will then publish an OrderPlaced event that will kick off the workflow. Generally, I keep the orchestrator owned by the boundary that kicks it off, but this isn’t always the case.
Once the workflow kicks off, it will create a MakePayment command and send it to the message broker.
At this point, the Payment service doesn’t have to be available. This doesn’t break the workflow if it’s down for whatever reason. Once it’s available, it will process messages from the broker.
Once the Payment Service processes the command, it can publish an event or a reply. You can use the request-response pattern with asynchronous messaging. How? Check out my Asynchronous Request-Response Pattern for Non-Blocking Workflows
The Payment service sends a PaymentCompleted event back to the broker.
The Ordering service, which contains the orchestrator, will consume the PaymentCompleted message.
All the orchestrator is consuming the PaymentCompleted event to know that it must now create a CreateShippingLabel command and send it to the broker. There is no database calls or lookups in the orchestrator, it’s simply consuming events and sending commands.
Now the Warehouse service consumes the CreateShippingLabel command and then sends a ShippingLabelCreated Event back to the broker.
At this point, our workflow is complete, to which the orchestrator will consume the ShippingLabelCreated and mark the workflow as done.
Workflow Orchestration
Workflow orchestration can be accomplished by asynchronous messaging with commands and events, allowing you to execute long-running business processes and workflows and remove temporal coupling between services.
Each service can operate independently by consuming messages and sending events/replies once completed.
If a service isn’t available, it doesn’t break the entire workflow or business process. Once it comes back online, the workflow continues as the service starts consuming messages.
Join!
Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.