Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

Workflow Orchestration for Resilient Systems

Building resilient software systems can be difficult, especially when they are distributed. Executing a long-running business process or workflow that involves multiple different service boundaries requires a lot of resiliency. Everything needs to be available and functioning because if something fails mid-way through, you can be left in an inconsistent state. So a solution? Workflow orchestration and removing direct service to service communication and temporal coupling.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Workflow

When working with a monolith and a single database, all database interactions are wrapped in a single transaction. Things are pretty straightforward.

Monolith

If there is any failure while executing, you simply roll back the transaction. If everything works correctly, the transaction is committed.

The problem arises when you have a set of distributed services involved in the workflow. You won’t likely have a distributed transaction, so there is no “rollback.” We can’t use the same mindset.

Service to Service

If there is service to service RPC calls, what happens when service D, far down the RPC call stack, fails? If any other services up the call stack did some type of state change or had some side-effect, how do they roll back? Each service needs to be aware of potential failures otherwise, they will be in an inconsistent state.

Using RPC for service-to-service communication can have many more issues, including latency. Check out my post REST APIs for Microservices? Beware!

Synchronous Orchestration

The first step is to remove the rat’s nets of service-to-service communication. This allows us to reason about a service; if it has any failures, they aren’t because of other services.

The first might be to create an orchestrator that handles all the interactions. Procedurally calling one service after the other.

Sync Orchestration

Once the first call to the Order service is made, the client (orchestrator) makes the next call to the Payment service.

Sync Orchestration

After the payment service returns, the client/orchestrator then makes a call to the Warehouse service. But what happens if the warehouse service call fails?

Sync Orchestration Failure

We’re in a similar situation where we need to back and call the payment service to try and revert or undo the previous call. In the case of the payment service, this may be to refund or void a transaction.

We haven’t accomplished much in making a synchronous orchestrator over service-to-service communication because, ultimately, the problem is that we are relying on every service to be available and operating without any failures for the entire workflow or business process.

We want to remove this temporal coupling by making each service independent, so they do not rely on each other and don’t need to be available simultaneously.

Asynchronous Workflow

You can accomplish asynchronous workflow orchestration by using messaging through commands and events allow us workflow to execute asynchronously. An orchestrator can consume events published by other services and then react to them by sending commands to the appropriate boundary to continue the workflow. The orchestrator directs the workflow.

To illustrate, when the initial Client requests the Ordering Service, it will make a state change to create an Order.

Workflow Orchestration

Ordering will then publish an OrderPlaced event that will kick off the workflow. Generally, I keep the orchestrator owned by the boundary that kicks it off, but this isn’t always the case.

Workflow Orchestration

Once the workflow kicks off, it will create a MakePayment command and send it to the message broker.

Workflow Orchestration

At this point, the Payment service doesn’t have to be available. This doesn’t break the workflow if it’s down for whatever reason. Once it’s available, it will process messages from the broker.

Workflow Orchestration

Once the Payment Service processes the command, it can publish an event or a reply. You can use the request-response pattern with asynchronous messaging. How? Check out my Asynchronous Request-Response Pattern for Non-Blocking Workflows

The Payment service sends a PaymentCompleted event back to the broker.

The Ordering service, which contains the orchestrator, will consume the PaymentCompleted message.

Workflow Orchestration

All the orchestrator is consuming the PaymentCompleted event to know that it must now create a CreateShippingLabel command and send it to the broker. There is no database calls or lookups in the orchestrator, it’s simply consuming events and sending commands.

Now the Warehouse service consumes the CreateShippingLabel command and then sends a ShippingLabelCreated Event back to the broker.

At this point our workflow is complete, to which the orchestrator will consume the ShippingLabelCreated and mark the workflow as done.

Workflow Orchestration

Workflow orchestration can be accomplished by asynchronous messaging with commands and events, allowing you to execute long-running business processes and workflows and remove temporal coupling between services.

Each service can operate independently by consuming messages and sending events/replies once completed.

If a service isn’t available, it doesn’t break the entire workflow or business process. Once it comes back online, the workflow continues as the service starts consuming messages.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Distributed Tracing to discover a Distributed BIG BALL of MUD

Distributed tracing is great for observing how a request flows through a set of distributed services. However, it can also be used as a band-aid to mask a problem that you shouldn’t have.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Distributed Tracing

So why is distributed tracing helpful? When you’re working within a Monolith, you generally have an entire request processed within the same process. This means if there is a failure of any sort, you can capture the entire stack trace.

When you’re in a large system that is decomposed with a set of distributed services that interact, it can be very difficult to track where the exception is occurring. Also, it can be difficult to know the latency or processing time for the entire request and where the bottleneck might be from service to service calls.

As an example, if there is a request from a client to Service A, and it needs to make a call to other services, they might make calls to other services.

Distributed Monolith

With distributed tracing, you could see the flow of a request that passes through multiple distributed services. To illustrate, here’s a timeline of the diagram above.

Distributed Monolith Timeline

So it’s great that distributed tracing can give us away from observing a request’s flow. The problem is having service-to-service communication can lead to another set of challenges beyond tracing.

Distributed tracing in this service-to-service system style is a band-aid to a problem you shouldn’t have. Blocking synchronous calls, such as HTTP, from service to service can provide issues with latency, fault tolerance, and availability, all because of temporal coupling. Check out my post on REST APIs for Microservices? Beware! that dives deeper into this topic.

Blocking Synchronous Calls

However, not all blocking synchronous calls can be avoided. Specifically, any type of query, such as a request from a client, will naturally be a blocking synchronous call. If you’re doing any type of UI composition, you may choose to use a BFF (Backend for frontend) or API gateway to do this composition. The BFF makes synchronous calls to all services to get data from each to compose a result for the client.

UI Composition

Distributed tracing in this situation is great! We’ll be able to see which services have the longest response time because, ultimately, if we are making all calls from the BFF to backing services concurrently, the slowest response will determine the length of the total execution time from the client.

UI Composition Distributed Tracing Timeline

Workflow

Another great place for distributed tracing is with asynchronous workflows. It has always been very challenging to see the flow of a request executed asynchronously by passing messages via a message broker. Distributed tracing solves that and allows us to visualize that flow.

As an example, the client requests the initial service to perform a command/action.

The service will then create a message and send it to the message broker for the next service to continue the workflow.

Another service will pick up this message and perform whatever action it needs to take to complete its part of the entire workflow.

Consume Message from another service

Once the second service is completed processing the message, it may send another message to the broker.

Continue workflow by sending another message to the broker

A third service (ServiceC) might pick up that message from the broker and perform some action that is a part of this long-running workflow. And just like the others, it may send a message to the broker once it’s complete.

Consume message from a final service

At this point, ServiceA, which started the entire workflow, may consume the last message sent by ServiceA to do some finalization of the entire workflow.

Initial service completes workflow

Because this entire workflow was executed asynchronously and has removed the temporal coupling, each service doesn’t have to be online and available. Each service will consume and produce messages at its rate and availability without causing the entire workflow to fail.

Distributed Tracing timeline

OpenTelemetry & Zipkin

I’ve created a sample app that uses OpenTelemtry with NServiceBus for an asynchronous workflow that can then be visualized with Zipkin. If you want access to the full source code example, check out the YouTube Membership or Patreon for more info.

As an example with ASP.NET Core, I’ve added OpenTelemery packages and added the registration for them in the ConfigureServices of the Startup. This will add tracing for NServiceBus, any calls using the HTTPClient, and ASP.NET Core itself.

With NServiceBus I have a saga that is orchestrating sending commands to various logical boundaries to complete the workflow.

After running the sample app, I can open up Zipkin and see the entire trace that spans my ASP.NET Core app that is going through the various logical boundaries, including the database calls SQL Express, and the HTTP call to Fedex.com

Distributed Tracing

Distributed tracing is great for collecting data and observing the flow of a long-running business process or if you’re doing UI Composition using a synchronous request/response involving many different services. However, don’t use it as a crutch because there is a pile of service-to-service synchronous requests/responses proving difficult to manage. If anything, use distributed tracing to realize you have a high coupled distributed monolith so you can remove some of the temporal coupling making your system more loosely coupled and resilient.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Fintech Mindset to Software Design

If you’re creating a line of business or enterprise-type software, I think one of the most valuable skills you can have isn’t technical. Rather it’s understanding how the business domain you are in works. One way is following how money flows through a system by having a fintech mindset.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Revenue & Cost

I was on the Azure Devops Podcast, where I mentioned that a big influence on my career was working with an Accountant. No surprise, this has affected how I look at a line of business and enterprise-type systems, how they can be decomposed, and how money flows through the system.

At a high level, I’m thinking about Revenue and Cost.

For context, I’m going to talk about project-based work to illustrate.

How does the company generate revenue? How does the software system we are building fit into how the company generates revenue?

How does the company incur costs to generate that revenue? How do we keep track of that in the system, similar to how we generate revenue.

We generally have customers, sales, and invoicing on the revenue side.

Revenue Side

There is some type of sales process involved with a CRM that ultimately leads us to a sale and invoice a customer to get paid.

On the cost side, we may have employees with a salary or work per hour. We pay them at some interval (weekly, bi-weekly, monthly) based on their salary or hours worked (timesheet).

Employee Cost Side

Other costs can occur, such as materials or services from outside vendors. We will create purchase orders and get receipts.

Vendor Cost Side

Follow the Money

When thinking about decomposing a system and understanding a domain, I help to follow the money. This can lead to understanding the workflows and processes that generate revenue and cost.

In the case of project base work, the heart is in the execution. The actual project. There are many ways to “sell” a project, such as Time & Materials, Fixed Price, etc. The same goes for costs, some employees might be Salary (fixed) some may be hourly.

Regardless, the point is the execution of the project is where the complexity lies. This is likely to be the heart of the system you’re building.

Execution

For example, if you’ve worked on a software project before or done any project management, you can relate that this is the heart of complexity.

So if you were writing a system to manage the full lifecycle of project-based work, the execution of the project would be the core. Many boundaries would act purely in a supporting role, such as CRM, Employee, Payroll, Vendors, and Invoicing. You don’t necessarily need to write your own; this is usually pretty generic, and you can buy it off the shelf or software as a service.

And this makes sense because you’re not about to write a CRM or Accounting system. You can integrate with those to push/pull the relevant data required.

On the other hand, Timesheets, Sales, and Purchasing may be something you choose to write because there isn’t anything specific enough for the context/niche of what you’re building.

Core and Supporting Boundaries

Fintech Mindset

I’ve written software for Distribution, Accounting, Manufacturing, and various forms of Transportation, and they have all of this in common. Have a fintech mindset by understanding revenue and costs and how each is made.

Here’s a screenshot of QuickBooks, an accounting system typically used for small businesses. It gives a good example of how everything is related and how money flows.

Quickbooks

In writing line of business and enterprise type systems, I’ve always found the best developers are the ones that have equally great technical development ability as well as domain knowledge. Having a fintech mindset and understanding how money flows helps.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design