Event Choreography for Loosely Coupled Workflow

What’s Event Choreography Workflow? Let’s back up a bit to answer that. Event Driven Architecture is a way to make your system more extensible and loosely coupled. Using events as a way to communicate between service boundaries. But how do you handle long-running business processes and workflows that involve multiple services? Using RPC is going back to tight coupling, which we’re trying to avoid with Event Driven Architecture. So what’s a solution? Event Choreography.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

RPC

So why not just use RPC via an HTTP API or gRPC? Well, the point of using asynchronous messaging and an event driven architecture is to be loosely coupled and remove temporal coupling. Temporal coupling matters because a long-running business process or workflow that involves many different services means that all services need to be available, and there can be no failures. You can add retries for transient failures, but if a significant issue with a service or data causes the workflow to fail, you have no recourse to resolve the issue. You can be stuck in an inconsistent state.

For example, if we had many service-to-service RPC calls, and far down the call stack, there is a failure, how do you handle that?

RPC Service to Service

If Service A and Service C made state changes, they would need to have some type of compensating action to reverse the state change. There is no distributed transaction; you can’t just roll back the state of each service database. You need to deal with failures in each service that makes a service-to-service RPC call.

When all services are working correctly, there are no issues. But the moment you have a service that becomes unavailable, any workflow that involves that service will be immediately affected, likely causing the entire workflow to fail and leaving your system in an inconsistent state.

There are a whole other set of issues with RPC between services that I’m not even mentioning here, one of them being latency. Check out my post REST APIs for Microservices? Beware!

RPC Orchestration

Now you might think instead of making service-to-service RPC calls, is to make some type of orchestration that would make the RPC calls.

RPC Orchestration: Call Service A

After it makes the first RPC call to the first service, it would make the following subsequent calls to other services in order.

RPC Orchestration: Call Service B

This would address the failures as the orchestrator could handle the retries upon failures. If there is a more extended failure/outage, it will make a request for a compensating action to void/undo a call to a previous service.

RPC Orchestration: Failure

While this orchestrator sounds better than service-to-service RPC, we haven’t solved much. We still have a high degree of coupling, and our workflow will fail if a service is down or unavailable. If we have a failure and we need to make a call back to a previous service to perform some type of compensating action, what happens if that call fails? Again, we’re back to being in an inconsistent state.

Event Choreography

We can loosely couple between services with an event driven architecture and the publish-subscribe pattern and remove temporal coupling by publishing and consuming events.

For example, Service A receives a request from the client that kicks off the workflow.

Event Choreography

After making its state change to its database, Service A would publish an event to our message broker.

Event Choreography

At this point, the request from the Client to Service A is complete. Service B will consume the message/event published by Service A. Service B consumes this message to do its part of the workflow.

Event Choreography

Once Service B has successfully consumed the message, it will publish an event to the broker, which will kick off the next service that is a part of the workflow.

Event Choreography

Service C is the next service involved in the workflow, and it consumes the event published by Service B.

Event Choreography

Because we aren’t temporally coupled, each service consumes a message and processes it independently.

This means if Service C is unavailable or has a backlog of messages to process, that doesn’t break the workflow. Once the service becomes available again, and the message is processed, the workflow continues.

Service Unavailable

Is Event Choreography a silver bullet for workflow involving multiple services? No. Of course not. The challenge with event choreography for workflow is when you have more extended or complex workflows that involve more than just a few services. It is hard to visualize a workflow because there is no centralized orchestration to understand the entire workflow.

If you have complex workflows that involve many different services, check out my post on Workflow Orchestration for Resilient Systems

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

What is Software Architecture?

Software architecture is about making critical decisions that will impact how you can make decisions in the future. It’s about giving yourself options at a relatively low cost early on so your system can evolve without a high cost. Software architecture is about options.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Cost of Options

Software should be malleable. It shouldn’t be so rigid that you can’t change it because of new requirements or insights about the domain or the model you’ve built. You want to be able to evolve your system over time as all these emerge. You don’t want to have to pay a high price (complete rewrite) because you’re system is hard to change. Giving yourself low-cost options means making decisions that allow you to evolve your system over time but don’t come at a high cost (time/effort/etc.)

This means you have to pay the price initially to give yourself these options, usually early on in a project/product. These are critical decisions in choosing which options are low-cost and high-value.

I’m not talking about “what if” scenarios. Developers tend to make assumptions, often related to technical and business requirements. I’m not referring to all kinds of edge cases or technical concerns like scaling, which developers love to focus on.

The options I’m talking about are fundamental to your architecture. How you develop the system allows you to evolve it over time.

Coupling & Cohesion

A lot of software design comes down to understanding and making decisions based on coupling & cohesion.

To me, coupling & cohesion are the yin-yangs of software design. They are a push & pull against each other. You’re trying to increase functional cohesion and lower coupling.

There are different forms of coupling, but to give a definition:

“degree of interdependence between software modules”

ISO/IEC/IEEE 24765:2010 Systems and software engineering — Vocabulary

And for cohesion:

“degree to which the elements inside a module belong together”

Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design

Why do coupling & cohesion matter? Because ultimately, a lot of the decisions made are rooted in one or both of them, even if you don’t realize it.

If you do understand coupling and cohesion, it can better help you make decisions that provide options.

The 3 concrete examples I will provide in this post are all rooted in coupling & cohesion.

For more on coupling & cohesion, check out my post: SOLID? Nope, just Coupling and Cohesion

Logical Boundaries

The first way to give yourself options within your architecture is to define logical boundaries. Grouping related behaviors and functionality (capabilities) of what your system provides. Having groupings of capabilities that are functionality cohesive.

Not having a system that is free for all of functionality without any boundaries.

A piece of functionality shouldn’t be intertwined with other unrelated functionality. In other words, the dependencies for one piece of functionality shouldn’t affect another. An example of this is a database. A set/grouping of features should own and be responsible for the underlying data for that feature set.

Define logical boundaries where you’re grouping functionality that works together on a set of underlying data. Focus on the capabilities and behaviors of your system. Group those capabilities into logical boundaries.

Within a logical boundary, you can make decisions that are isolated within it. How should you perform data access? Which type of database is best for the data set of that logical boundary? How should we model within the given context? Different boundaries will have different models. By defining logical boundaries (cohesion), you can make all kinds of decisions that are best for the feature set within that boundary. This gives you options.

Boundaries are one of the hardest things to define correctly, yet the most important things to do. Check out a whole series and talk: Context is King: Finding Service Boundaries

Loose Coupling

If you’re defining boundaries, how do you communicate between them?

In a free for all, there is coupling everywhere. You have different parts of the system that are directly coupled to other parts of the system. This could be coupling between classes/modules or, generally, at it’s worse, via the database.

I often refer to this as a turd pile. But it’s just a system that’s lost control of coupling.

If you’ve defined logical boundaries, as explained earlier, you’ll likely need to communicate between them. Any system will have long-running business processes and workflows that span many logical boundaries.

To remove tight coupling, we can leverage asynchronous messaging. Removing direct communication between boundaries means we are also removing temporal coupling. In other words, you’re not bound by time.

This means that one boundary can send a message to a queue for another and can be processed independently.

Because you have logical boundaries (cohesion), this works best with long-running business processes or workflows. So often, we model our workflows as being synchronous requests/responses when in reality, we could be building a much more resilient system by making them asynchronous.

Asynchronous messaging and event-driven architectures give you options by loose coupling! Check out my Real-World Event Driven Architecture! 4 Practical Examples

CQRS

Unfortunately, CQRS is a buzzword (acronym) that is widely misunderstood.

Command Query Responsibility Segregation is often conflated with Event Sourcing, Asynchronous Communication, Domain Driven Design, Multiple Databases, and more. If you search and read enough posts, you’re bound to find a similar diagram.

CQRS Confusion

Sadly, while this is CQRS, as mentioned, it’s also conflating a bunch of other patterns or concepts. CQRS is nothing more than separating reads and writes even from a service layer.

Yes, really. It’s that simple. Still don’t believe me? Check out my post CQRS Myths: 3 Most Common Misconceptions, where I reference many of the early blog posts from Greg Young.

So why is it important? Because it gives you options. Defining separate paths for reads and writes allows you to make decisions for each path and each occurrence.

If you look at the first diagram illustrates a Command Bus, a Domain model, Event Sourcing, and a projection (multiple databases). All of that is facilitated by the decision to separate commands and queries.

CQRS is a gateway to other patterns and concepts, but at its core, it’s pretty trivial but it gives you options!

What is Software Architecture?

To me, software architecture is about critical decisions, usually early on within a product/project, that give you future options. Options that allow you to evolve your system over time. The cost is usually relatively low when making decisions and giving yourself options at the very beginning. Defining logical boundaries, loose coupling between boundaries with a message/event-driven architecture, and CQRS.

As always, it’s rooted in coupling and cohesion.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Should you Soft Delete?

Should you delete records from your database or instead use a soft delete? I was recently asked my view on this question by a follower on Twitter. So what’s my answer? Well, I’m not usually thinking about “deleting” anything. Instead, I’m thinking about adding.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Hard Delete

So we’re all on the same page, let me take a step back and discuss deleting data in general. Hard deletes refer to removing the data from the database.

For example, with a table in a relational database, there are two records in the Employee table.

Table of Rows

Doing a hard delete would mean executing a delete statement that would physically remove the record from the table. Only one record would be remaining.

Hard Delete

Give yourself a high five if you got the movie reference to the remaining employee!

Soft Delete

Typically if you’re using a relational database, you might have foreign key constraints that prevent you from deleting rows to enforce data integrity. You don’t want to have a reference to EmployeID123 in another table when it no longer exists.

Soft deletes solve this issue by often adding a column to indicate if the record/row is “deleted” or inactive.

For example, the table has an IsDeleted column deleted to False.

Soft Delete Table

Instead of hard deleting, records are updated to have IsDeleted set to true.

Similarly, a property indicates if the document/object is deleted if you were working with a document database.

Soft Delete Document

Business Concepts

With a line of business and enterprise-style system, when working in the core part of a system, I’m often not thinking about deleting or soft deleting anything. This is because there are business concepts and business processes that don’t involve deleting data.

People within the domain don’t generally think about “deleting” data, nor would they use the term “delete” when talking about a business process or workflow.

Jumping back to my earlier example of employees, let’s say we were working within an HR system. An employee is a key aspect of the system, and there are different workflows around them. One of those concepts would be their employment history.

When an employee was terminated or quit, we wouldn’t just do a soft delete and mark them as “IsDeleted”. That doesn’t make sense. Employees would have a lifecycle of being hired and their employment terminated.

Events

When an employee’s employment is terminated, other data is likely relevant to that event. When was it terminated, when was it effective, and what was its reason?

The key is that this is a business event that occurred, and we want to capture all the relevant data with that event.

Document Events

Maybe we represent this in our document/object as a collection of hiring and terminations with the relevant data. It’s possible that an employee is re-hired at a later date and has multiple periods of employment.

Again the key is focusing on the business events that occur. If you think about events as a primary driver of your system, you’ll likely land into thinking about persisting events as a state. This is Event Sourcing. If you’re unfamiliar with the concept of Event Sourcing, check out my post Event Sourcing Example & Explained in plain English.

CRUD

Now I can hear people screaming already; it’s just CRUD! While this can be true on the outer edge of a system that is more in a supporting role. In a large system, you’ll have various boundaries that act purely in supporting roles and are mostly reference data. And yes, this can be CRUD and would likely just soft delete.

However, at the core of your domain, as mentioned, you won’t hear people that understand the domain talk about “deleting.” Almost all events have some type of compensating action that ends the life cycle of a process or “undoes’s” or voids a previous action. Capturing those business events and concepts is key to building workflow within your system. If you’re focusing on CRUD, all business processes, workflows and understanding live entirely in the end-users head. Your system, at that point, is nothing more than a UI to a database with no real capabilities.

GDPR

I can also hear people screaming, but… GDPR! I have to delete the data!

You’re missing the point.

If you need to delete data, delete it. The point isn’t about not deleting data; the point is that if you’re “soft deleting” data, you’re losing information about business concepts/events that have likely occurred as part of a business process or workflow. The events that occurred, and why they occurred can be incredibly valuable in building a robust system that can evolve as requirements change.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design