Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

Event Carried State Transfer: Keep a local cache!

What’s Event Carried State Transfer, and what problem does it solve? Do you have a service that requires data from another service? You’re trying to avoid making a blocking synchronous call between services because this introduces temporal coupling and availability concerns? One solution is Event Carried State Transfer.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Temporal Coupling

If you have a service that needs to get data from another service, you might just think to make an RPC call. There can be many reasons for needing data from another service. Most often, it’s for query purposes to generate a ViewModel/UI/Reporting. If you need data to perform a command because you need data for business logic, then check out my post on Data Consistency Between Services.

The issue with making an RPC call and the temporal coupling that comes with it is availability. If we need to make an RPC call from ServiceA to ServiceB, and ServiceB is unavailable, how do we handle that failure, and what do we return to the client?

Service to Service

We want ServiceA to be available even when ServiceB is unavailable. To do this, we need to remove the temporal coupling so we don’t need to make this RPC call.

This means that ServiceA needs all the data to fulfill the request from the client.

Service has all the data within it's own boundary

Services should be independent. If a client makes a request to any service, that service should not need to make a call to any other service. It has to have all the data required.

Local Cache

One way to accomplish this is to be notified via an event asynchronously when data changes within a service boundary. This allows you to call back the service to get the latest data/state from the service and then update your database, which is acting as a local cache.

For example, if a Customer were changed in ServiceB, it would publish a CustomerChanged event containing the Customer ID that was changed.

Publish Event

When ServiceA consumes that event, it would then do a callback to ServiceB to get the latest state of the Customer.

Consume and Callback Publisher

This allows us to keep a local cache of data from other services. We’re leveraging events to notify other service boundaries that the state has changed within a service boundary. Other services can then call the service to update their local cache.

The downside to this approach is that you could be receiving/accepting a lot of requests for data from other services if you’re publishing many events. From the example, ServiceB could have an increased load handling the requests for data.

You’re still dealing with availability, however. If you consume an event and then make an RPC call to get the latest data, the service isn’t available or responding. As with any cache, it’s going to be stale.

Callback Failure/Availability

Event Carried State Transfer

Instead of having these callbacks to the event’s producer, the event contains the state. This is called Event Carried State Transfer.

If all the relevant data related to the state change is in the event, then ServiceA can simply use the data in the event to update its local cache.

Event Carried State Transfer

There are three key aspects to Event Carried State Transfer: Immutable, Stale, and Versioned.

Events are immutable. When they were published, they represented the state at that moment in time. You can think of them as integration events. They are immutable because you don’t own any of the data. Data ownership belongs to the service that’s publishing the event. You just have a local cache. And as mentioned earlier, you need to expect it to be stale because it’s a cache.

Versioning

There must be a version that increments within the event that represents the point in time when the state was changed. For example, if a CustomerChanged event was published for CustomerID 123 multiple times, even if you’re using FIFO (first-in-first-out) queues, that does not mean you’ll process them in order if you’re using the Competing Consumers Pattern.

Competing Consumers

When you consume an event, you need to know that you haven’t processed a more recent version already. You don’t want to overwrite with older data.

Check out my post Message Ordering in Pub/Sub or Queues and Competing Consumers Pattern for Scalability.

Data Ownership

So what type of data would you want to keep as a local cache updated via Event Carried State Transfer? Generally, reference data from supporting boundaries. Not transactional data.

Because reference data is non-volatile, it fits well for a local cache. This type of data isn’t changing often, so you’re not as likely to be concerned with staleness.

Transactional data, however, I do not see as a good fit. Generally, transactional data should be owned and contained within a service boundary.

An example with an online checkout process. When the client starts the checkout process, it makes requests to the Ordering service.

Start Checkout Process

The client then needs to enter their billing and credit card information. This information isn’t sent directly to the Ordering service but to the Payment service. The payment service would store the billing and credit card information to the ShoppingCartID.

Payment Information

Finally, the order is reviewed, and to complete the order, the client then requests the Ordering service to place the order. At this point, the order service would publish an OrderPlaced event only containing the OrderID and ShoppingCartID.

Place Order

The Payment service would consume the OrderPlaced event and use the ShoppingCartID within the event to look up the credit card information within its database so it can then call the credit card gateway to make the credit card transaction.

Consume OrderPlaced and Process Payment

Event Carried State Transfer

Event Carried State Transfer is a way to keep a local cache of data from other service boundaries. This works well for reference data from supporting boundaries that aren’t changing that often. However be careful about using it with transactional data and don’t force the use of event carried state transfer where you should be directing data to the appropriate boundary.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

SOLID? Nope, just Coupling and Cohesion

How do we avoid writing spaghetti code so our systems don’t turn into a hot mess? For me Coupling and Cohesion. Some people focus on things like SOLID principles and Clean Architecture. While I don’t necessarily have a problem with that if you’re pragmatic, I don’t ever really think about either of those explicitly.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

Coupling and Cohesion

With over 20 years of professional software development experience, I’m mostly thinking about coupling and cohesion as a guide to software design. I’m not explicitly thinking about SOLID principles or clean architecture.

Coupling and cohesion are like the yin-yang of software design. They push and pull against each other. You’re striving for high cohesion and low coupling, however, you’re trying to find a balance. We’re always fighting against coupling.

There are many different forms of coupling but to narrow it down for simplicity’s sake:

“degree of interdependence between software modules”

ISO/IEC/IEEE 24765:2010 Systems and software engineering — Vocabulary

At a micro level, this could be the interdependence between classes or functions, and at the macro level, this could be the interdependence between services.

When you’re thinking about spaghetti code you’re likely actually referring to a system that has a high degree of coupling at the macro and micro levels. This makes changes difficult to make or they break other parts of the system because there’s a high degree of coupling. Check out my post on Write Stable Code using Coupling Metrics for more on coupling.

Cohesion refers to:

“degree to which the elements inside a module belong together”

Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design

You can look at cohesion both from a micro and macro level. From a micro level, this can be viewed as how do all the methods relate in a class? Or how do all the functions of a module relate? From a macro level, how do all the features relate to a service? Check out my post on Highly COHESIVE Software Design to tame Complexity for more on cohesion.

But what does “belong together” mean? For me, this is about functional cohesion. Grouping-related options of a task. Not grouping based on data (which is informational cohesion). Yes, data is important, however, data is required by the functionality that is exposed.

Business Capabilities

Speaking of functionality, let’s jump to the Single Responsibility Principle for a second as this might clarify why functional cohesion is important.

When you write a software module, you want to make sure that when changes are requested, those changes can only originate from a single person, or rather, a single tightly coupled group of people representing a single narrowly defined business function. You want to isolate your modules from the complexities of the organization as a whole, and design your systems such that each module is responsible (responds to) the needs of just that one business function.

https://blog.cleancoder.com/uncle-bob/2014/05/08/SingleReponsibilityPrinciple.html

This might seem a bit more clear how the single responsibility principle at its root addresses coupling and cohesion.

If we focus on business capabilities and group them together, we’ll end up with a service. That’s why I always define a service as:

A Service is the authority of a set of business capabilities.

At a macro level, we’re trying to have a high degree of cohesion by grouping by business capabilities. Behind those capabilities is data ownership.

Unfortunately, many are still focused purely on data and what I call “entity services”. This was a screenshot from a post on Reddit. The question was if this data model diagram should be implemented as a service per “entity”.

This type of thinking is based on informational cohesion, not functional cohesion. The focus solely on data and not thinking at all about functionality and behavior leads to design generally around CRUD.

As an example, you’ll often have “Manager”, and “Repository” classes that look similar to this.

The Product “entity” is really just a data model with a bunch of properties. There is no behavior exposed, they are just data buckets.

So where is the actual functionality of a product? In a large system, for example in a distribution domain, a product isn’t just a product.

A product means different things to different people within that domain.

Service Boundaries by Functional Cohesion

Sales are thinking about the Sale Price of a product and are customer-centric. Purchasing/Procurement is thinking about the cost and is vendor centeric. The warehouse is concerned about shipping and receiving. They all have different business capabilities, and because those capabilities are different, they care about different data behind those capabilities.

A product isn’t just a product that has to live within a single boundary. It’s a concept that can live within multiple different service boundaries.

Organizing related business capacities (features) into services allows us to decide how we want to handle technical concerns within each service or to a specific feature. Dependencies shared concerns such as data access, validation, etc. All of these decisions can be localized and made per service.

Services with features with high functional cohesion

Loose Coupling

So you’ve defined boundaries based on functional cohesion, but they must have some interdependence between each other?

Tight Coupling between Services

Having a free for all where any service can be coupled to any other service is still a hot mess of spaghetti. In other words, tight coupling. The system will still be hard to change and fragile to change.

You want to remove the tight coupling by having service boundaries be independent and not directly coupled to other services. One way of achieving this is through loose coupling provided by messaging.

Loose Coupling between Services

Because you’re grouping by business capabilities, and the data behind those capabilities, each service should have all the data it needs. This means that you aren’t coupling services because you need to fetch data from them to perform an action.

Services together need to create workflow and exchange information. As an example from the diagram above, if the Warehouse has a “Quantity on Hand”, you might think that Sales would need that so it knows if it can sell a given product. However, Sales actually has its own concept called ATP (Available to Promise) which is a business function that is the projected amount of inventory it can sell. This consists of what’s in stock in the Warehouse, not allocated existing orders (Invoicing), as well as purchase orders and expected receipts (Purchasing).

Sales can maintain their ATP by consuming events from other services. It does not need to make directly coupled to an API to make calls at runtime to calculate ATP for a given product. It maintains and owns the ATP for all products based on events it’s consuming that are published from other services. When Invoicing publishes an OrderInvoiced event, it can subtract an amount from ATP. If the warehouse does a stock count and publishes an InventoryAdjusted event, Sales will update the ATP accordingly.

Responsibility

From a lower-level code perspective, the coupling can be challenging when we have to deal with many different technical concerns. Web, Authorization, Validation, Data Access, and Business Logic. But as mentioned earlier, each feature/capability can define how each of these concerns is handled.

While you can share between features, such as a domain model, this naturally starts limiting the coupling between a set of features.

One approach to handle the common concerns is by using the pipes & filters pattern. Specifically, the Russian doll model which the request passes through a filter that can call (or not) the next filter. This allows you to separate various concerns and create a pipeline for a request.

Pipes & Filters

For more check out my post on Separating Concerns with Pipes & Filters

Coupling and Cohesion

Building a large system without any boundaries (low cohesion) and a high degree of coupling is ultimately building a big turd pile of a system.

Big Turd Pile

It will be hard to add new functionality, and hard to change existing functionality without breaking and causing regressions.

Focusing on having highly cohesive services that provide specific business capabilities and loosely coupling between those services to provide workflow and business processes will allow you to build a system that is more resilient to change.

Smaller Turd Piles

Decomposing a large system into smaller independent logical service boundaries allows you to make different decisions that are localized to an individual service. No system will be perfect, especially over time and it will need to evolve. Focus on coupling and cohesion as a guide.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Blocking or Non-Blocking API calls?

When should you use blocking or non-blocking API calls? Blocking synchronous communication vs non-blocking asynchronous communication? Should commands fire and forget? What’s the big deal of making blocking synchronous calls between services?

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

Communication

There are 3 different forms of communication that I’m generally thinking of. Commands, Queries, and Events. I’m going to cover all three and the pros/cons of using blocking synchronous or non-blocking asynchronous with each of them.

Commands

With commands, it’s all about users’/clients expectations. When the command is performed, do they expect to get a result immediately and the execution be fully complete? Because of this expectation, often times commands are blocking synchronous calls.

Blocking Command

If the client was requesting a state change to the service, the expectation is that if they want to then perform a query afterward, they would expect to read their write. There would be no eventual consistency involved. Blocking synchronous calls can make sense here.

Where asynchronous commands can be useful is if the execution time is going to be longer than the user might want to wait. This is an issue with blocking calls as the client could be waiting to get a response back to know if the command is completed. As an example, if the command had to fetch and mutate a lot of data in a database that could take a long time. Or perhaps it has to send 1000s of emails which could take a lot of time to execute.

Another situation where async commands are helpful is when you want to capture the command and just tell the user it has been accepted. You can then process the command asynchronously and then let the user know asynchronously that it has been completed. An example of this is placing an Order on an e-commerce site. When the user places an order, it doesn’t immediately create the order and charge their credit card, it simply creates a command that is placed on a queue.

Non-Blocking Command

Once the message has been sent to the broker, the client is then told the order has been accepted (not created). Then asynchronously the process of creating and charging the customer’s credit card can occur.

Consume Asyncronously

Once the asynchronous work has been completed, typically you’d receive an email letting you know the order has been created. Depending on your context and system, something like WebSockets can also be helpful for pushing notifications to the client to let them know when asynchronous work has been completed. Check out my post on Real-Time Web by leveraging Event Driven Architecture

Queries

Almost all queries are naturally request-response. They will be blocking calls by default because the purpose of a query is to get a result.

Blocking Query

This is pretty straightforward when a service is interacting with a database or any of its own infrastructure within a logical boundary, such as a cache.

Where this totally falls apart is when you have services that are physically deployed independently and they are calling each other using blocking synchronous communication (eg, HTTP, gRPC).

Blocking Query Service to Service

There are many different issues with service-to-service direct RPC style communication. I cover a bunch of them in my post REST APIs for Microservices? Beware!

One issue is latency. Because you don’t have really any visibility into the entire call stack without something like distributed tracing, you can end up with very slow queries because there are so many RPC calls being made between services to build up the result of a query.

The second issue and the worst offender is tight coupling. If a service is failing or unavailable, your entire system could be down if you aren’t handling all these failures appropriately. If Service D has availability issues, this will affect service A indirectly because it’s calling service C which requires service D.

Blocking Query Service to Service Failures

Congratulations, you’ve built a distributed monolith! Or as I like to call it, a distributed turd pile.

So how do you handle queries since they are naturally blocking and synchronous? Well, what you’re trying to do is view composition.

There are different ways to tackle this problem, one of which is to have a gateway or BFF (Backend-for-Frontend) that does the composition. Meaning it is the one responsible for making the calls to all the services and then composing the results back out to the client.

Query Composition

This isn’t the only way to do UI/ViewModel composition. There are other approaches such as moving the logic to the client or having each logical boundary own various UI components. Regardless, each service is independent and does not make calls to other services.

As for availability, if Service C is unavailable, that portion of the ViewModel/UI can be handled explicitly.

There is one option that I won’t go into in this post, which is asynchronous request-reply. This allows you to leverage a message broker (queues) but still have the client/caller block until it gets a reply message. Check out my post Asynchronous Request-Response Pattern for Non-Blocking Workflows.

Events

It may seem obvious (or not) but events generally should be processed asynchronously. The biggest reason why is because of isolation.

If you publish an event in a blocking synchronous way, this means that the service calling the consumers when the event occurs has to deal with the failures of the consumer.

Blocking Events

If there were two consumers, and the first consumer handled the event correctly, but the second consumer failed, the service/caller would need to handle that failure so it doesn’t bubble up back to the client. Also without any isolation, if the consumer takes a long time to process the event, this affects the entire blocking call from the client. The more consumers, the longer the entire call will take form the client. You want each consumer to execute/handle the event in isolation.

The publisher should be unaware of the consumers.

Publish Event

It doesn’t care if there are any consumers or if there are 100. It simply publishes events.

Subscribe and Consume Events

Blocking or Non-Blocking API calls?

Hopefully, this illustrated the various ways that blocking synchronous and non-blocking asynchronous can be applied to Commands, Queries, and Events.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design