Software Architecture Q&A: Microservices, CQRS & More!

You have questions, I have answers… well mainly opinions! Here are the top questions I was asked for this Software Architecture Q&A. As you can expect they are around Microservices, Messaging, CQRS & Event Sourcing.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

What is the difference between SOA and Microservices?

I get this question often and I was expecting it for this Software Architecture Q&A. I believe it’s because of how I present Event Driven Architecture in relation to Microservices. The answer to this question really depends on your definition of both SOA and Microservices. The definition of Microservices that I generally use is from Adrian Cockcroft:

Loosely coupled service oriented architecture with bounded contexts.

https://www.slideshare.net/adriancockcroft/dockercon-state-of-the-art-in-microservices

If I dissect that definition, Service Oriented Architecture that’s loosely coupled. Loosely coupled implying Event Driven Architecture.

In my mind, SOA was always about being loosely coupled, however that may not have been how the general developer community understood it at the time.

The Bounded Context portion of the definition is from Domain Driven Design. Bounded Contexts are about defining boundaries. Services should own a set of business capabilities and the data behind those capabilities. You aren’t sharing behaviors or data between services. Services are independent. And because they are independent communication is done in a loosely coupled fashion via an Event or Message Driven Architecture.

So my answer to this question is nothing is different between the two.

Is the saga an anti-pattern since you have a transaction spanning multiple bounded contexts?

No, a saga is not an anti-pattern. It’s responsible for coordinating a long-running business process between multiple services where there are no distributed transactions. Since there can be failures, the saga is responsible for handling failures by sending compensating actions to the appropriate services. The saga is a centralized place to handle the workflow of a long-running business process. And when I say long-running business process, that could mean milliseconds to weeks.

An alternative to a Saga is using Event Choreography, which removes the centralized logic of orchestration. Event Choreography is having all services consume and produce events that ultimately fulfill the long-running business process.

Check out my post on Event Choregrophay and Orchestration for more.

When should I choose CQRS over CRUD based “RESTful” endpoints?

Choose CQRS over CRUD when you want to be explicit.

With CRUD, when you’re making state changes via Create, Update, Delete, you aren’t capturing explicitly why it’s occurring. For example, if you’re performing an Update to a customer why is that happening? Did their address change? Did their discount rate change? With CRUD based approach, you don’t know exactly, you’d have to imply it based on the change being made.

With CQRS, all your commands are making those state changes explicit. You’re not “updating a customer” but rather you’re using explicit commands such as ChangeCustomerAddress or IncreaseDiscountRate. Because commands are explicit, you can then infer various events to be published from those commands.

If you wanted to publish an event based on CRUD, you would have very generic events such as CustomerUpdated. If you’re using CQRS, you’d have events such as CustomerAdresssChanged and CustomerDiscountRateIncreased.

How to handle versioning when Event Sourcing?

No Software Architecture Q&A on this Blog/YouTube could happen without a few questions about Event Sourcing.

The definitive answer to versioning is Greg Young’s Versioning in an Event Sourced System eBook.

I often like to compare this to a relational database. If you were adding a new column to a table, you would either need to make the column nullable or give it a default value and then possibly backfill existing data.

With event sourcing, making a new property on an existing event nullable is exactly the same. You’d need to deal with the value being null. If it’s not null and has a default value the situation is still the same as a relational database. The difference is that you cannot backfill and update old events in an existing stream. What you can do in this situation is make sure any old event can be upconverted at runtime to the new event.

Best Practices and Pitfalls when Event Sourcing?

The biggest pitfall is how different the concept can be to developers who have only ever done CRUD and only ever recorded the current state.

While I don’t think the concept of Event Sourcing is difficult, how you deal with an immutable log (event store) is very different than just recording the current state.

An example of this is when using a relational database if you needed to “fix” data, you’d simply go and write an UPDATE statement to update records in your database. With Event Sourcing, since the events are in an immutable log (event stream), you can’t just “update” an existing event. Nor would that actually even make much sense. You generally have to create compensating events to “undo” something.

In the stereotypical example, if you deposited $10 into a bank account, however, that deposit was recorded as $100 by mistake. The bank would actually create a withdrawal of $100 and then a deposit of $10. Meaning they would do a full reversal of your original incorrect deposit.

It’s just different than I believe most developers are used to. As mentioned in the previous question, versioning is another topic that’s just something very different from what most people are used to.

How to handle eventual consistency from eventually consistent projections (read models)

In most event-sourced systems, projections are used to create different read models used oftentimes for specific use cases. If you’re unfamiliar with Projections, check out my post on Projections in Event Sourcing: Build ANY model you want!

However this question doesn’t even need to be specific to event sourcing, but any data source that you’re querying that is eventually consistent. For example, a read replica that has lagged behind the primary.

I think the pitfall here is how you set users’ expectations. If a user performs an action that they know is changing the state of the system, however after they perform the action, you don’t provide them with the data that shows the updated state, they will not have a good experience. They have an expectation of consistency.

From a technical perspective, using versions (numbers). When you perform a query, return a version number in the response to the caller. When the caller needs to re-query, they will know if the update has occurred because the version will have changed. If it hasn’t you can wait and retry to get the data again.

How and when did you start using DDD & CQRS?

I can’t remember the exact year, but I bought the Domain-Driven Design book sometime around 2009. I can’t remember exactly how I was introduced to it but I do know at the time I stumbled upon some blog posts and conference talks by Greg Young and Udi Dahan.

Both Greg and Udi were/are a large influence on how I think about designing and developing software.

Source Code

Developer-level members of my CodeOpinion YouTube channel get access to the full source for any working demo application that I post on my blog or YouTube. Check out the membership for more info.

Related Posts

Follow @CodeOpinion on Twitter

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.

Architecture Decision Records (ADR) as a LOG that answers “WHY?”

Architecture Decision Records will save you from guessing. Have you ever worked on a codebase and wondered why you were using a particular library/framework, applying a pattern, or deploying a particular way? If you ask a teammate and they also have no idea because that was defined before them as well. Is the decision still relevant for the current context? What was the context when the decision was made? A lot of this is institutional knowledge that is lost, leaving you guessing. Instead, capture Architecture Decision Records along with your code in the same repository. It’s a log of all the decisions made along the course of a project/product.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Institutional Knowledge

There are generally two phases that personally go through when looking at code that I’m not understanding its purpose.

The first phase is the “I’m an idiot” phase where I don’t understand something and assume it to be correct. This could be how something is modeled, how a library is being used, or just anything that I see that is intriguing. I assume the reason why something exists makes sense. It must be correct. The benefit of the doubt is given to the original author of whatever code I’m looking at.

The second phase is the “This is terrible” phase where I believe something is wrong and that my assessment of what it should be is correct. My plan at this point is to change whatever it is I think is incorrect.

Architecture Decision Records

Once I come out of this second phase, I come to my sense and realize that I have no context on why something is the way it is. Without context about how decisions were made at the time, both phases I described above are incorrect.

In phase one, I can’t assume the context now is the same as the context was when the decision was made. Doing so would be accepting the status quo. In phase two, if I decide to change something, there could be a very good reason that isn’t trivial for why the decision was made.

In either situation, I don’t understand the context of the decision when it was made at the time.

Software evolves, especially software products that can have a very long lifespan. Decisions made by developers and/or architects that may no longer be involved in the product/project anymore. Or perhaps they no longer even work at the company anymore. Institutional knowledge will be lost and your ability to understand the context can be very limited.

Architecture Decision Records

A solution to losing this institutional knowledge is to record decisions in a lightweight fashion. Michael Nygard posted about documenting architecture decisions which described a format to what to capture. The basics are a Title, Context, Decision, Status, Consequences.

Here’s a template using the proposed format.

When you create an ADR, you store it alongside your source code. Within the same repository. This could be in the /docs/adr/ of your repository. If you want to reference in code comments to a specific ADR, have at it!

Be pragmatic about these records. If you want to record other information not listed in the template, do so. If you don’t need status, don’t use it.

When a decision is made, create an ADR with enough information that someone not involved in the decision would understand. Write ADRs for future readers.

You can simply create new ADR markdown files manually, but if you prefer using a tool you can use the adr-tools scripts to create new ADR files and supersede existing ADRs.

Example

Here’s a sample Architecture Decision Records that I’ve created about using a messaging library or using the SDK directly of a message broker.

Architecture Decision Record Log

Hopefully, you can see how Architecture Decision Records provide a log of decisions. A log living alongside your code in the same repository can give context to decisions that were made throughout the life of a codebase.

This isn’t documentation, the is a historical log. This allows you to go back in time and read your ADRs to understand how you got to where you are.

Source Code

Developer-level members of my CodeOpinion YouTube channel get access to the full source for any working demo application that I post on my blog or YouTube. Check out the membership for more info.

Related Posts

Follow @CodeOpinion on Twitter

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.

Synchronous vs Messaging: When to use which?

Not all communication will be synchronous request/response with HTTP/RPC or asynchronous messaging within a system. But how do you choose between Synchronous vs Messaging? Well, it depends on if it’s a command and/or a query as well as where the request is originating from. If you want reliability and resiliency, then use messaging where it’s appropriate.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Synchronous

The most common places you’ll encounter making synchronous Request/Response calls are to 3rd party services, infrastructure (like a database, cache, etc), or from a UI/Client.

A typical example of this would be a Javascript frontend application making an HTTP call to a Web API. Or perhaps your backend making a call to a Cloud Service such as Blog Storage. Integration involving newer B2B services is usually done as synchronous request/response calls using HTTP.

Asynchronous

Communicating asynchronous most often happens between other internal services that your team or another team owns within an organization. Another good approach is to use asynchronous messaging to your own service or monolith. Check out my post on using Message Driven Architecture to decouple a monolith for more.

In B2B, a common use case for asynchronous communication is EDI where you’re exchanging files via mailboxes.

Commands & Queries

One way to distinguish where to use Synchronous vs Messaging is if you’re performing a command or a query. A command being a request to change state and a query being a request to return state.

If you’re using a javascript application in the browser, that’s communicating with an HTTP API backend service, you’re going to be using HTTP for synchronous request/response. If the backend service is communicating with a database, that will also be synchronous. This is to be expected and how much applications work.

Synchronous Messaging: When to use which?

However, when you’re communicating between internal services, I recommend communicating asynchronously using a Message Driven Architecture via Events and Commands.

This means your services are communicating via a Message Broker to send commands to a queue or publish messages to a topic.

Synchronous Messaging: When to use which?

The reason I do not recommend HTTP to communicate between services is from the complexity of dealing with latency issues, availability concerns, resilience, difficulty debugging, and most importantly coupling.

Check out my post on REST APIs for Microservices? Beware! for more info on why you should avoid it as the primary way to communicate between services.

Origination

An important distinction besides commands and queries in the Synchronous vs Messaging debate is where the request originated from. For example, a client UI/browser sends an HTTP Request to our Web API Backend service, which then makes a synchronous call to a 3rd party service.

Synchronous Messaging: When to use which?

But what happens when that 3rd party service is unavailable or is timing out? What do we return to the client UI? Do we just send back the Client UI an error message? Is the 3rd party service is critical to your application/service, does that mean as long as it’s unavailable, your service is also unavailable?

Take this exact same situation, but change the originator to be a message broker, the implications are very different.

Synchronous Messaging: When to use which?

The synchronous call from our app/service was caused by a message from a message broker and the 3rd party service is unavailable, then we have many different courses of action.

We can do an immediate retry to resolve any transient errors. We can implement an exponential backoff, where we retry and wait a period of time before retrying again. You can also move the message to a dead letter queue that will allow you to investigate and manually retry the messages once you know the 3rd party service is available again.

You have many different options. Check out my post on Handling Failures in a Message Driven Architecture for more info.

The point being is that you aren’t losing work that needs to be completed, nor are any users going to be aware that is potentially an issue.

To accomplish this for the above example, we simply need to change from synchronous to asynchronous at some point through the call path.

Synchronous Messaging: When to use which?

This means that our Client UI will still make a synchronous call to our app/services, however instead of calling the 3rd party immediately, we’ll enqueue a message to the message broker and return back an immediate response to our client UI.

Then we will consume that same message asynchronously from the message broker and complete the work that needs to communicate with the 3rd party. If there are any failures, we now have the ability to handle those failures with different resiliency and fault tolerance and we do not lose any work that needs to be done.

Use Case

A good example of how this applies in real applications is when you’re in the AWS EC2 Console. This would apply to many different services with any cloud provider. I’ll use AWS EC2, which is an AWS Virtual Machine service.

If you have a running instance and you choose to stop the instance, it doesn’t immediately stop.

When you click the “Stop Instance” menu option, the browser doesn’t sit and wait for the HTTP request to finish. The request is fairly quick and then your browser displays that the Instance state is “Stopping”.

This is because of the exact example I illustrated earlier where the request from the browser was synchronous, but the actual work being performed is asynchronous.

Synchronous Messaging: When to use which?

Not all interactions can be done this way. If the end-user expects to immediately see their changes, then forcing an asynchronous workflow will prove challenging. If you can provide the user with the correct expectation about long-running processes then you can leverage events to drive real-time web.

Check out my post on using Real-Time Web by leveraging Event Drive Architecture.

Source Code

Developer-level members of my CodeOpinion YouTube channel get access to the full source for any working demo application that I post on my blog or YouTube. Check out the membership for more info.

Related Posts

Follow @CodeOpinion on Twitter

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.