Untangling the many aspects of EDA

With the popularity of Microservices, Kafka, and Event Sourcing, the term “Event” has become pretty overloaded and has caused much confusion about what EDA (Event-Driven Architecture) is. This confusion has led to conflating different concepts leading to unneeded technical complexity. I will shed some light on different aspects of EDA, such as Event Sourcing, Event-Carried State Transfer, and Events for Workflow.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

“Event”

If you ask people how they apply event-driven architecture, you’ll likely get many different answers. It would be best if you had specifics about how they use events, as there are many different purposes and utilities for them.

Events and EDA could be referred to when using events for state persistence, data distribution, and notifications. If you break these down further, those relate to Event Sourcing, Event Carried State Transfer, Domain Events, Integration Events, and Workflow events.

Let’s dig into all this to answer the question: “What do you mean by event?”

Event Sourcing

Event sourcing is about using events as a way of persisting state. Full stop. It has nothing to do with communication between service boundaries. It’s about state.

Check out my post Event Sourcing Example & Explained in plain English for more of a primer, or if you think you understand Event Sourcing.

Greg Young posted a snippet of a book he’s working on, in which he wanted a simple and clear definition to explain Event Sourcing.

Event Sourcing

Event Sourcing is often confused with Event Streaming, or you must be using event sourcing to use events as means to communicate with other service boundaries. Which often means using the events used in event sourcing as a form of data distribution (more on that below).

Don't integrate at the DB

How you persist state is an internal implementation detail of your service boundary. You provide public APIs or contracts to expose any state within your service boundary. Other services should not reach out directly to your service boundaries database to query or write data. We don’t do this. We expose APIs as contracts for this and version them according. State persistence is an internal implementation detail. So if you use event sourcing to persist state, the same rules still apply. You cannot have other service boundaries querying your event store directly.

Data Distribution

Events are often used as a way to distribute data to other services. This is called Event-Carried State Transfer, as the event payload contains entity-related data. While this can have utility, I’m often very concerned about distributing data.

Why do you need data from another service boundary? Most often, the answer is because of query or UI Composition purposes. If that’s the case, check out my post The Challenge of Microservices: UI Composition

If you need data from another service boundary to perform a command/operation, then realize you’re working with stale data if you have a local cache, and you will not have any consistency.

Why would we want a local cache to begin with? The route most people take to land here is they first start with publishing events to notify other service boundaries that some entity state has changed. Other service boundaries (consumers) then process these events by then making a synchronous RPC call from the publisher to get all the current data related to the entity that changed.

Because this callback to the publisher can have a lot of implications, such as latency, increased traffic, and availability, the next logical step is then to include the data in the event itself, so no callback to the publisher is required. This is why this is termed Event-Carried State Transfer.

You may have noticed that I used “Entity” a few times. This is because these types of events are often more entity-centric. As an example, they might be ProductChanged or ProductPriceChanged.

Event Carried State Transfer

This is often caused by the service itself being CRUD-driven and not task-based. If you are consuming these CURD/Entity type events, you do not really know why something changed, just that data changed related to an entity. If you consume a ProdcutUpdated event, why was it updated? Was there a price increase? You would need to infer the reason for the change without any certainty.

Check out my post Event Carried State Transfer: Keep a local cache! for more on where this is applicable and where you should avoid it.

Notifications

Events used for notifications within EDA are generally more explicit. They notify other service boundaries (or your own) that a business event has occurred. Events as notifications generally do not contain much data other than identifiers. They are used in Event Choreography or Orchestration to execute long-running business processes or workflows.

These are the types of events that relate to business concepts and are often driven by a tasked based UI—as an example, ProductPriceIncreased, ProductDiscontinued, or FlashSaleScheduled. By looking at these event names, you can tell explicitly what they are and what occurred. ProductChanged does not. These events explicitly define what has occurred and why, as they are directly related to the business capabilities your system provides.

Events used as notification can come in a couple of forms. Domain Events and Integration Events. Personally, I rather term these as Inside Events or Outside Events.

Inside Events (Domain) are within your service boundary. Outside Events (Integration) are for other service boundaries to consume. Why the distinction? Because inside events are internal implementation details about how you may communicate within a boundary. They can be versioned much differently than outside (integration) events. Once you publish an event for other service boundaries to consume, and they rely on that event and its schema, you have to deal with versioning. With inside events, your versioning strategy is much different as you control the consumers. Outside events (integration), you may have little control over the consumers.

EDA Tooling

EDA Tooling

Depending on what you’re using events for in EDA will determine what type of tooling you need. Are you event sourcing? Then you’ll want to use a database such as Event Store based around event streams that include optimistic concurrency, subscription models for projections, and more.

Are you using events as Notifications for workflows and long-running business processes? You likely want to use a queue-based broker like RabbitMQ and a messaging library like NServiceBus that facilitates many messaging patterns used when using events as notifications.

Are you using events to distribute data? For this purpose, you might want to look at event streaming platforms like Kafka.

While some tools claim they can provide all the functionality outlined here, sometimes it’s a forced issue to try and mimic the functionality—square peg, round hole type of situation.

All these different utilities for events are not an either-or. You could be using Event Sourcing without anything else. You could be using Events as notifications without Event Sourcing. You could be doing both. They all have different purposes, and understanding that help will help you so you aren’t going to shoot yourself in the foot by conflating different concepts.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Scaling a Monolith with 5 Different Patterns

Want strategies for scaling a monolith application? You have many options if you have a monolith that you need to scale. If you’re thinking of moving microservices specifically for scaling, hang on. Here are a few things you can do to make your existing monolith scale. You’d be surprised how far you can take this.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Scaling Up

When referring to scaling, we’re talking about being able to do more work in a given period. For example, process more requests per second.

The most common approach to scaling is scaling up by increasing the resources related to our system. For simplicity’s sake, let’s say we have two aspects to our system that are physically independent, our application (compute) and database (data storage).

Depending on where the bottleneck is, we could scale up by increasing the resources, CPU, and Memory on our app compute.

Scaling Up Compute

However, with any bottleneck, once we alleviate it, we may affect downstream resources, in this case our Database. If we can now handle more requests in our App, we might be now overwhelming our database, in which case we would need to scale it up as well.

Scaling Up Database

All of this depends on the context of your system. Maybe it’s CPU intensive, and you need to scale up your App and your database will be fine with fewer resources. Maybe you’re more database driven and it is what needs to be scaled up. It really depends on the type of system you have and where its needs are.

Scaling Out

Another common approach is often to scale out by adding more instances of your App in front of a load balancer. Typically you’d call out your app compute because scaling your our your database is typically more difficult depending on which type of database you’re using.

Scaling Out Compute

This means you have multiple instances of the same app running, and the load balancer, typically via round robbin will distribute incoming requests to different instances. While this helps to scale, it also helps availability.

Another aspect of scaling out that isn’t mentioned as much, related to availability, is directly specific traffic to specific instances of your app.

Scaling Compute Out Per Grouping

You may choose to have different resources (CPU & memory) for different app segments. In the diagram above, the top two instances of the app may handle specific inbound requests defined by rules within the load balancer. Those instances may have CPU & Memory requirements than the instance at the bottom that handles a different set of requests.

Queues

Often when we need to do more work, that doesn’t mean it needs to be done immediately as the request comes in. Often time we can perform the work asynchronously. A good method for doing this is leveraging queues. There are probably many places you can find within an existing system where you can move the work asynchronously and out of process using a queue.

When an inbound request comes in, we can then place a message on our queue and return back to the client. The message enqueued would contain all the relevant information fro the initial request.

Queues

Asynchronously we can have the same process, or a separate process then pull that message from the queue and perform the work based on the contents of the message.

Workers

A really common example of this is anywhere you might generate and send an email in your system. Instead of sending the email when some action occurred, enqueue a message and do it asynchronously. You can return back to the client the initial request without having the email also be sent in that same request most often. It can be done asynchronously.

Read Replica

Depending on the type of database you use, scaling may be more difficult. A common approach is to add read replicas that you can then use to perform queries against. Since most applications perform more reads than they do writes. This allows you to scale out your reads to your database by introducing read replicas.

Read Replicas

Often times read replicas can be eventually consistent and there can be a lag in replicating the data from the primary database to your replicas. In these scenarios, you need to be aware of this and handle it appropriately in code if you expect to read your own write.

Materialized Views

Similar to read replicas is generating separate read models that are specialized specifically for queries. This involves pre-computing values and persisting them to a specialized read model. If you’re familiar with Event Sourcing, this is what you think of Projections as.

Materialized views, since they are pre-computed, allow you to have specialized views specific for queries. As mentioned, since most systems are more read-intensive than they are write-intensive, this allows you to optimize complex data composition ahead of time rather than doing it at the runtime of a query.

Materialized Views

Caching

First, caching isn’t easy. Check out my post The Complexity of Caching, before you go down this path. Caching is useful in reducing the load from your read replicas or primary database from those pesky queries that I keep mentioning. Similar to materialized views, you can choose to cache values that are pre-computed or in a shape that are more appropriate in a given context for a query.

Caching

Multi-Tenant

If you have a multi-tenant SaaS application, data can be siloed into its own databases per tenant. Compute can be pooled or siloed in the same way. Or you can do both and create lanes for tentats that have their own dedicated compute and databases. There are many different options to consider. Check out my post Multi-tenant Architecture for SaaS.

Multi-Tenant

Mix & Match

You have a lot of options when it comes to scaling a monolith. It’s not just about scaling up, you can also scale out differently from your compute and underlying database. Moving work and process out of process using a queue and creating materialized views or caching along with using read-replicas. Depending on your context you may choose to employ different techniques or possibly all of them depending on the size of your system.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Greenfield Project? Start HERE!

If you’re starting a new greenfield project or rearchitecting an existing system, how much effort do you put into the overall architecture and design? What are the types of things you should be considering or thinking of? It’s also really applicable if you have an existing system that you might be trying to decompose or rewrite portions. I will discuss aspects critical to the foundational architecture and design, allowing you to evolve your system over time.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Large System

First, everything I’m referring to in this blog/video relates to large systems that could take years to develop. I’m not referring to a single application that could be rewritten in weeks or months but rather large systems. These large systems usually take years to develop and also evolve over the years.

Anyone that’s worked in a large system knows the pain when it becomes an unmanageable mess. It’s a mess of tight coupling that’s hard to change and easy to introduce bugs because of unknown side effects.

Nobody wants to develop a system that turns into a mess, so the question is, how much effort do you put into the initial overall architecture and design so you don’t develop a mess in the future?

Logical Boundaries

What does your system do? At the heart of it, what problems does it solve? Yes, it provides all kinds of functionality, but what are the core set of capabilities?

Not all functionality is created equal in terms of value. Your system’s core set of capabilities has a higher value than other features that are more for supporting purposes.

So your large system will have many different parts (logical boundaries). Some of those parts will be the focal point that contains a lot of the value your system provides, while other parts are there to support the core. As an example, food delivery system, the core might be the ordering and delivery process. However, CRM and Accounting might be in a supporting role.

Defining logical boundaries is one of the most important things to do, yet one of the most challenging to get “right”. This is because over time, your understanding and model might change, and you might realize your boundaries are “incorrect”. This is ok!

One of the reasons defining logical boundaries can be difficult, but is also a way to define them is by language. You’ll often hear about this in Domain Driven Design with the ubiquitous language. Often times you’ll hear the same terms used by different people but they mean different things. As an example, if you were talking about a distribution domain where you buy and sell products, the term “product price” means different things. To someone in sales, the product price is what we charge our customers. To someone in purchasing the product price is what the vendor or manufacture charges us. The concept of a product is different for each of hte people in sales and purchasing. The have different concerns.

Defining logical boundaries means grouping functionality that relates. We often have a free-for-all of coupling because we aren’t making the distinction that a concept can live in more than once place. Meaning if we only have a single instance for manging a “product” that mean that the concerns of sales woudl be mixed with that of purchasing.

Instead, we want to group functionality and split these concepts up and align with the business.

A significant advantage of defining logical boundaries as mentioned earlier is that they don’t all have the same value or the same requirements. This means that one logical boundary in a supporting role might be better suited to CRUD with Document Database. While another logical boundary that’s more the heart of our system is using an Event Store and is more Task Driven. We can define the implementation details per logical boundary based on its needs rather than the entire system.

Coupling

Once logical boundaries are defined, you’ll often need to communicate between them to execute workflows or business processes. Another foundational component to define early on is a message and event-driven architecture.

Asynchronous messaging allows you to decouple your logical boundaries by producing messages and having other services consume those messages.

There are two forms that I often talk about which are commands and events. If you’re unfamiliar, check out my post Commands & Events: What’s the difference?

Commands are used to tell a specific boundary to perform an action. There can be many different senders of a command. Senders send a message to a queue/endpoint where a single consumer will consume and process that message. The senders know which queue/endpoint to send the message to but are unaware of when the message will be processed by the single consumer.

Commands and Queues

Events are used for the Publish/Subscriber pattern, where a publisher publishes an event on a topic, and there could be many or no consumers. The publisher is totally unaware of how many consumers there are or what they do.

Topics and Publish Subscribe

Why does this decoupling matter? Because it allows you to extend your system and embraces the asynchrony of the real world.

For example, let’s say your food delivery system has a new requirement to send a text message to the customer when the delivery driver is approaching your home. The delivery driver’s mobile phone would be sending GPS coordinates to the system as it’s traveling. What are these coordinates? They’re events. DrivePositionUpdated event would contain the latitude/longitude and date/time. We can have this event published to a topic that we can create a brand new consumer for, which would process these events and when applicable, send the text message to our customers to notify them. None of this new functionality is coupled to existing code, it’s entirely new and segregated.

Logical isn’t Physical

Developers love talking and working on scaling a system to handle more traffic or have a higher workload. Absolutely different requirements related to scaling and performance can drive requirements and your architecture. However, there needs to be a distinction between logical boundaries and physical boundaries. They aren’t the same thing. They can be, but they don’t have to be.

If you’re unfamiliar, check out my post on the 4+1 Architectural View Model which is the diagram below.

There are different representations of your system. How you define a logical boundary doesn’t mean it needs to be deployed as a single unit.

As an example, here are four logical boundaries. CRM interacts with an external source of Salesforce, Finance interacts with an external source for accounting, Ordering has its relational database, and Delivery uses an Event Store.

Logical Boundaries

However, all 4 of these logical boundaries could be composed together and deployed as a single process.

Physical Boundaries

Because logical boundaries aren’t tightly coupled, if any one of them needed to be scaled differently, we could deploy it independently.

Your logical boundaries don’t need to be the same physical boundaries.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design