CodeOpinion: Software Architecture & Design

Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

  • Sidecar Pattern for Abstracting Common Concerns

    What is the sidecar pattern? Applications and services often have generic concerns such as health checks, configuration, metrics, as well as how they communicate with each other either directly or through messaging. Services usually implement these using libraries or SDKs to handle these concerns. How can you share these concerns across all these services so you’re not implementing them in every service? The sidecar pattern and ambassador pattern might be a good fit to solve this problem.

    YouTube

    Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

    https://www.youtube.com/watch?v=9zAjtcf9Wyo

    Shared Concerns

    Regardless of the platform, you’re going to leverage libraries/packages/SDKs to handle common concerns like health checks, configuration, metrics, and more. Each service will have these common concerns and need to use the libraries for their respective platform.

    For example, if you had two services, one written in .NET and the other written in Go. Each service would leverage libraries in its ecosystem to provide this common functionality.

    Shared concerns for different services

    Even with libraries, you still need to define how you’re handling and dealing with these common concerns. You may be using the same underlying infrastructure for both of them. As an example, each service might be publishing metrics to Cloudwatch.

    Wouldn’t be nice if there was a standardized way that each service would handle these shared concerns?

    Sidecar pattern

    The sidecar pattern allows you to extract the common concerns from your service and host them separately in a separate process, known as a sidecar.

    sidecar process runs locally to service

    In the containerized world, this is often thought of as a separate container from your service, however, it really is just a separate process that runs locally alongside your service.

    With a sidecar, your service can now interact with a separate process that will handle the common concerns. The sidecar can perform health checks on your service, your service can send metrics to the sidecar, etc.

    Ambassador Pattern

    Back to the example of sending metrics to Cloudwatch. With the sidecar, we can also apply the Ambassador Pattern. This all makes the sidecar a proxy to send data to Cloudwatch. This means our service is interacting with the sidecar, using a common API, and it’s in charge of sending the data to Cloudwatch. It’s not just metrics and Cloudwatch, this applies to any external service.

    sidecar as a proxy to external services

    The benefit is that the sidecar is handling any failures, retries, backoffs, etc. We don’t have to implement all kinds of retry logic in our service, rather that’s a common concern handled by the sidecar when communicating with external services.

    If we want to make an HTTP request to an external service, we proxy it through the sidecar.

    sidecar handling retry logic

    If there is a transient failure, the sidecar is responsible for doing the retry while still maintaining the connection to our service while it does so.

    Handling transient failures and retries

    Abstraction

    While each service has its own instance of a sidecar, you can be using the same sidecar for many different services. This allows each service to have the exact same interface to all of the shared concerns the sidecar provides.

    This means that you can also be using it for things like a message broker.

    sidecar as abstraction to message broker

    If you read any of my other blog posts or watch my videos on YouTube, you know I’m an advocate for loosely coupling between services using messages, and not using blocking synchronous request-response.

    A sidecar can provide a common abstraction over a message broker. This means that each service doesn’t have to interact with the broker directly, nor does it need to use the specific libraries or SDKs for that broker. The sidecar is providing a common API for sending and consuming messages and is abstracting the underlying message broker.

    different services to AMQP supported message broker

    This means you could have one service using .NET and another using Python, both exchanging messages to a broker supporting AMQP. Each service would be completely unaware of what that underlying transport and message broker is.

    Trade-offs

    So why would you want to use a sidecar and ambassador patterns? If you have services that are using different languages/platforms, and you want to standardize common concerns. Instead of each service implementing using their native packages/libraries/SDKs to their respective platform, the sidecar pattern allows you to define a common API that each service uses. This allows you to focus more on what your service actually provides rather than common infrastructure or concerns.

    One trade-off to mention is latency. If you’re using a sidecar with the ambassador pattern to proxy requests to external services, a message broker, etc, you’re going to be adding latency. While this latency might not that much, it’s something to note.

    If you don’t have a system that’s comprised of many services, or they are all using the same language/platform, then a sidecar could just be unneeded complexity. You could leverage an internal shared package/library that each service would use for defining a common API for shared concerns, it doesn’t need to be a sidecar. Again, your context matters. Use the patterns when you have the problems these patterns solve.

    Join!

    Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

    Follow @CodeOpinion on Twitter

    Software Architecture & Design

    Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

  • Event Sourcing Tips: Do’s and Don’ts

    When people are getting into Event Sourcing, there are a few common questions that I often get or issues see people run into. CRUD Sourcing, Pre-mature optimization using Snapshots, and exposing your event streams for integration. Here are my top three Event Sourcing Tips to help you down the right path.

    YouTube

    Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

    https://www.youtube.com/watch?v=SYsiIxJ-Nfw

    CRUD Sourcing

    My first event sourcing tip, which is probably the most common issue I see people run into when new to Event Sourcing is what is often called “CRUD Sourcing”. If you’re used to developing applications/systems in a Create-Read-Update-Delete style, then this means you’ll likely end up creating Events that are derived from Create, Update, Delete.

    There’s a shift when moving from an application that simply maintains the current state via CRUD to having your point of truth be a stream of events. Often events are artifacts of the state change as well as the business event that caused that state change.

    If you provide a UI that’s CRUD driven on Entities, you’ll end up with events that are derived from that. As an example, you’d start creating events such as ProductCreated, ProductUpdated, and ProductDeleted.

    Still in line with this is if you have updates that are for properties on Entities, you’ll end up with events such as ProductQuantityUpdated or ProducePriceChanged.

    In both cases, the events are simply representing state changes but not why the state changed.

    What was the reason why a ProductUpdated occurred? It was updated, great, but why?

    How about the ProductQuantityUpdated event, why did that change? Was it because there was an inventory adjustment? Did we receive more quantity of the product from the supplier? Why was it updated?

    Being explicit about the events is important because we want to be driven by business concepts. To get out of CRUD we need to move to more of a Task Driven UI. This allows us to have the client/UI explicitly perform a Command/Task. For example, if the user performs an Inventory Adjustment as an explicit Command/Task, that’s a business concept. We will generate an InventoryAdjustment event.

    Being explicit is important because you do not need to derive or guess based on the data of the event and why it occurred. You’ll have many more answers to questions when you look at an event stream when they are explicit. As an example, when’s the last time we did an inventory adjustment? When we do inventory adjustment, how many times are we decreasing the quantity on hand? You cannot answer these questions with a ProductUpdated or a ProductQuantityChanged.

    Optimizations

    Once people understand Event Sourcing and how it works, the most common question is:

    That seems really inefficient to have to fetch all the events from a stream to build up current state! What happens if I have 1000’s events!

    To understand how event sourcing works check out my post Event Sourcing Example & Explained in plain English.

    As a quick primer, you have a stream of events for a unique aggregate, such as Product with a SKU of ABC123.

    Event Stream

    Anytime we want to perform a command which will append a new event to the stream, we’ll generally fetch all the events form the stream, build up the current state, then enforce any invariants for the command we want to perform.

    In the stream above if we were keeping track of “quantity” as the current state, it would be 59.

    So back to the common comment of “that’s really inefficient”, is a pretty valid concern. The answer to this problem is called Snapshots, but they are an optimization that you don’t necessarily need to apply right from the start or often.

    In my experience, event streams are generally finite and have a life cycle with a beginning and end. There may be a long time duration for how long a stream is “active” but the events that are persisted are often limited.

    If you have a lot of events and it’s taking a long time to rebuild the state, then creating snapshots can help. They are a way of creating a point-in-time representation of the state.

    Event Stream Snapshot

    After so many events append to a stream, you persist another event to a separate stream that is the current state, also recording at which version of the stream it represents. This way when you want to rebuild the current state, you first get the last snapshot and then get the events from the stream since that snapshot was created.

    Now the question is, when do you create snapshots? If you think the event stream is going to contain a lot of events, how many for a given situation is a lot? Each different type of event stream is going to have different events which contain different data. There’s no magical number of events that is a threshold for creating a snapshot, it’s going to be use-case specific.

    Event Sourcing Tip, don’t jump to snapshots immediately, look at how you’ve defined your streams and boundaries.

    Communication & State

    This last mess people get in with Event Sourcing is conflating events representing state as well as using events as a way to communicate with other service boundaries.

    Your event store, the event streams, and the events within a stream represent the state.

    Event Store for State

    Often with Event Sourcing, you’ll create Projections as a way to represent the current state for Queries/UI/Reports. This way you can query a separate data store that’s pre-computed the current state. This means you don’t have to pull all the events to a stream to build the current state. It’s already pre-computed as events occur (usually asynchronously)

    Event Store and DocumentDB for Projections

    This means you’ll have two different databases. One for your event streams and one for projections. This is all within the same logical service boundary.

    Now if you’re using an Event Store that supports subscriptions, this doesn’t mean that other logical service boundaries should directly access the Event Store.

    A service can't access Event Store for integration

    People often do this as a way of Pub/Sub which is used for communication. But there is a difference between Events used inside a logical boundary to represent state and events used to communicate with other services.

    You wouldn’t have one service connect to another service’s relational database, would you? Then why would you access its Event Store?

    Don't access other services DB directly

    Domain Events used within a service boundary to represent the state are not integration events.

    Domain Events: Inside Events

    Integration Events: Outside Events

    Domain Events aren't integration events

    You want to define which events you want to publish to the outside world for integration. Domain Events and Integration events will be versioned entirely differently. The moment you expose a domain event to the outside world you potentially now have consumers that are going to rely on it. If you don’t publish your internal domain events, you can refactor and change events very differently.

    If another service was interacting with your relational database, and you made a change to a column name, you’d break them. If you change an event that’s not backward compatible, you’re going to break consumers. Don’t expose internal domain events as integration events.

    Event Sourcing do’s and don’ts

    Hopefully, these three Event Sourcing Tips give some insights that can help you if you’re new to Event Sourcing or if you ever questioned how various aspects work.

    Join!

    Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

    Follow @CodeOpinion on Twitter

    Software Architecture & Design

    Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

  • Avoiding a QUEUE Backlog Disaster with Backpressure & Flow Control

    I advocate a lot for asynchronous messaging. It can add reliability, temporal decoupling, and much more. But what are some of the challenges? One of them is backpressure and flow control. This occurs when you’re producing more messages can you can consume. Meaning you’re piling up messages in your queue and you can never catch up. The queue just keeps growing.

    YouTube

    Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

    https://www.youtube.com/watch?v=BIGiLJJlE08

    Producers and Consumers

    Producers send messages to a broker/queue and a Consumer processes those messages. For a simplistic view, we have a single producer and a single consumer.

    The producer creates a message and sends it to the broker/queue.

    Single Producer sending a message to a queue

    The message can sit on the broker until the consumer is ready to process it. This enables the producer and the consumer to be temporally decoupled.

    Temporal Decoupling provided as the queue holds the message

    The consumer then processes the message and it is removed from the broker/queue.

    Consumer process the message from the queue

    As long as you can consume messages on average faster than messages are produced, you won’t get into having a queue backlog.

    But since there can be many producers, or because of load, you may start producing more messages at a faster rate than can be consumed.

    For example, if you’re producing a single message every second, yet it takes you 1.5 seconds to process the message, you’re going to start filling up the queue. You’ll never be able to catch up and have an empty queue.

    Queue backlog

    Most systems will have peaks and valleys in terms of how many messages are produced. During the valleys is where the consumer can catch up. But if again, on average, you’re producing more messages than can be processed, you’re going to build a queue backlog.

    Competing Consumers

    One solution is to add more consumers so that you can process more messages concurrently. Basically, you’re increasing your throughput. You need to match or exceed the rate of production with consumption.

    The competing consumer pattern is having multiple instances of the consumer that are competing for messages on the queue.

    Competing Consumers of multiple consumer instances

    Since we have multiple consumers, we can now process 2 messages concurrently. Since one consumer is busy processing a message, if another message is sent to the queue, we have another consumer that is available.

    Each consumer available competes for the next message

    The consumer that is available will compete for the next message and process it.

    Competing consumers adds more processing which increases throughput

    There are a couple of issues with the Competing consumers’ pattern.

    The first is if you’re expecting to be processing messages in order. Just because you’re using a FIFO (first-in, first-out) queue, that does not mean you’ll process messages in order as they were produced. Because you’re processing messages concurrently, you could finish processing messages out of order.

    The second issue is you’ve moved the bottleneck. Any downstream services that are used when consuming a message will now experience additional load. For example, if you’re interacting with a database, you’re now going to add additional calls to that database because you’re now processing more messages at a given time.

    Competing consumers adding additional load on downstream services

    Incoming

    A queue is like a buffer. My analogy is to think of a queue as a pond of water. There is a stream of water as an inflow on one end, and a stream of water as an outflow on the other.

    If the stream of water coming in is larger than the stream of water going out, the water level on the pond will increase. In order to lower the water level, you need to widen the outgoing stream to allow more water to escape. This will lower the water level.

    But another way to maintain the water level is to limit the amount of water entering the pond.

    In other words, limit the producer to only be able to add so many messages to the queue.

    Setting a limit on the broker/queue itself means when the producer tries to send a message to the queue, if it’s reached its limit, it won’t accept the message.

    Queue Backlog handled by limiting producer

    Because the producer might be a client/UI, you might want to have built-in retry and other ways of handling this failure if you cannot enqueue a message. Generally, I think this way of handling this backpressure is used as a safeguard to not overwhelm the entire system.

    Queue Backlog

    Ultimately when dealing with queues (and topics!) you need to understand some metrics. The rate at which you’re producing messages. The rate at which you can consume messages. How long are messages sitting in the queue? What’s the lead time, from when it was produced to when it was consumed to be processed? What’s the processing time, and how long does it take to consume a specific message?

    Knowing these metrics will allow you to understand how to handle backpressure and flow control. Look at both sides, producing and consuming.

    Look at competing consumers’ pattern to increase throughput. Also, look at if there are optimizations to be made for how a consumer is processing a message. Be aware of downstream services that also will be affected by increasing throughput.

    Add a safeguard on the producing side to not get into a queue backlog situation you can’t recover from.

    Join!

    Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

    Follow @CodeOpinion on Twitter

    Software Architecture & Design

    Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design