Data Access Layer makes it easier to change your Database?

One primary reason for a data access layer or abstraction is your ability to change underlying databases easier. Have you ever replaced the underlying database of a large system or service? For example, moved from one relational database like PostgreSQL to MySQL. Or perhaps went from a relational database to a document or event store? There seem to be two groups of people. Those that have will say that abstracting the underlying database is crucial.

In contrast, the other group has never moved the database and questions abstracting or creating a data access layer because you likely won’t replace the database. Like many things in software architecture, it’s about coupling. If you limit coupling, this isn’t much of a hot topic.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Free for All

Let’s assume you have a large system or service with a lot of functionality that interacts with a single large database schema.

Large System

A block represents a piece of functionality. They all have direct access to the entire database schema. It’s a free for all. Any piece of functionality can access anything in the database. If the database was a relational database, this means all the tables. If you were using a document store, this could be all the collections. If you were using an Event Store, this is all the different event streams. Regardless of the database type, it’s free for all data access without data abstraction.

Does this sound like a nightmare? It would be. Because if you did need to change your underlying database technology, you’d have to go to every place that is doing data access and potentially rewrite it completely or modify it to work with the new database. This is much more involved if you change the database from relational to non-relational.

What’s most people’s answer to this problem? Abstraction. Create some layer of abstraction for your database to provide all the database access logic.

This could be a data access layer, using the repository pattern, or maybe just an ORM. Your application code and features then rely on this abstraction rather than coupling with the database directly.

Data Access Layer

So now, if we need to change the database, we’re rewriting the data access abstraction that we created, and all of our application code, in theory, should stay the same.

Coupling & Cohesion

While adding the abstraction can be helpful, I don’t think that’s the right way to look at the problem. The root of the problem is a coupling. There is a high degree of application code that relies on the database.

Large System

While this is true, not all features within the application require access to the same data within the database.

If we’re talking about a relational database, not everything in our system needs access to all the tables. Certain pieces of functionality require access to a certain set of tables.

To reduce coupling, we need to look at cohesion. What functionality in our system requires what data? We should then group this functionality.

Organized by Features

Start segregating functionality by related features working on the same subset of data. Again, not all features need to access the same large schema/data. Define boundaries over data ownership.

Once you start defining boundaries and grouping functionality, you’ll soon realize that each grouping might be able to define how they handle data access. This means that for some sets of features, you decide to use a specific ORM, while for others, you do direct data access without much abstraction.

Data Access Layer by Feature Sets

You’ve reduced coupling by increasing cohesion. This allows you to decide how you want to perform data access per set of features.

There is no more free-for-all of data access. Each set of features only access parts of the database it owns. One set of features should not access the data owned by another set of features.

You might start to see now that each feature set can also decide how data is persisted in which type of database. Maybe some feature sets use a relational database, while others use an event store. You can make these localized decisions per feature set (boundary)

Data Access Layers by Feature Sets

Since the entire application/system is now split and grouped into boundaries defined by functionality and data ownership, you don’t have free-for-all of data access. Often you do need to query other boundaries to get reference data. In the free-for-all scenario, you simply query the database to get that data. However, now you must expose an API that other boundaries can consume, like a contract.

This can be an interface, delegate, function, etc., in which you’re coupling between boundaries. However, the data returned from this API isn’t mutable. It’s purely used as reference data. All state changes to any data must occur within the boundary that owns the data.

Data Access Layer

This has nothing to do with creating a data access layer or abstracting data access. It has everything to do with coupling and cohesion. If you limit coupling and prevent integration at the database (as in data access free-for-all), then needing to change the underlying database is a matter of changing a narrow set of features within a boundary.

Will you change your underlying database? It doesn’t matter. A high degree of coupling will make any change difficult in a large system. Defining boundaries by functional cohesion and limiting coupling will allow a system to evolve.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Shared Database between Services? Maybe!

Is a shared database a good or bad idea when working in a large system that’s decomposed of many different services? Or should every service have its own database? My answer is yes and no, but it’s all about data ownership.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Monolith

When working within a monolith and generally a single database, you can wrap all database calls within a transaction and not think too much about failures. If something fails, you roll back. You can get consistency by using the right isolation level within your transaction to prevent dirty reads (if needed).

Monolith Shared Database

One challenge with a monolith is that it’s a single database; it’s often a free for all data access. Reads and writes are performed from anywhere within the monolith. Most often, a monolith for a large system will have a pretty large overall schema of hundreds of tables/collections/streams.

Because this leads to so much coupling, people tend to then go down the route of trying to define services that own certain parts of functionality within the system. However, they don’t separate the underlying data and keep a shared database.

Distributed Turd Pile

This often leads to what I call a Distributed Turd Pile. Or also known as a distributed monolith or a distributed big ball of mud.

The application has been split up into multiple services, but there is still a shared database where both services perform reads and writes. There is no schema or data ownership.

Shared Database

If you need to change a table/document and add a new column/property, which services do that? If different services are owned by different teams, which team is responsible for doing so? If you make a change, every other service now needs to be aware of that change, so it doesn’t break them.

Ultimately in a distributed turd pile, you’ve siloed the functionality but still have a massive schema that’s a free for all with no ownership.

Physical vs. Logical

When I originally asked, can you share a database between services? You’d guess my answer based on the above would be no, which is correct when talking about it from a logical perspective. Services should logically own their schema and data.

Physical boundaries aren’t logical boundaries, however. This means that you can have ownership of schema and data but still keep that within the same physical database instance as another service.

Shared Database Instance

This means that a single shared database instance can hold the schema and data for different services. This isn’t free for all of data access. Only the service that owns the schema can access it and perform reads and writes.

It’s about logical separation, not physical separation.

You don’t have to have a physically separate database instance for each service. This can be helpful in various scenarios, from a local developer to staging to even production if you have limited resources. Should use share the physical instance, maybe not if one service could consume a lot of the resource/capacity of the instance (noisy neighbors). Context matters; your situation matters. The point is, don’t confuse physical and logical boundaries as being the same.

I touch on this more in a blog post Microservices gets it WRONG defining Service Boundaries.

Query & UI Composition

If services have schema and data that can’t be accessed directly by other services, how do you do any type of UI or Query Composition?

One option is to create an API Gateway or BFF (Backend for Frontend) responsible for making the relevant calls to the required services and doing the composition to return to the client.

BFF

Another option is to use Event Carried State transfer to give services a local cache copy of reference data that can be used for UI composition.

When a service makes some type of state change, it will publish an event containing the state of the entity that changed.

Event Carried State Transfer

Other services can consume that event, update their local database, and use it as a cache.

Event Carried State Transfer

I do not recommend doing this for workflow or any business process. This isn’t for transactional data. This is for reference data that is often more in a supporting role of your system. This data isn’t volatile and fits well to be cached for this reason.

Lastly, another option for UI composition is to do it on the client itself. Each service can own a piece of the UI.

Consistency

It’s important to touch on consistency. Using a local cache when executing a command means using stale data. There is no difference between making an API call from service to service to get data to perform a command and getting it from a local cache. They are both going to be inconsistent when executing a command. Why? Because the moment you get back a result from a service-to-service call or a local cache, the data is stale.

If you need consistency, you need data owned by the boundary that requires consistency.

I often find that some ownership confusion is due to how workflows are thought of. I often see workflows created to a single boundary when it may involve many different boundaries.

As an example, let’s say you have an ordering checkout process. The first call from the client would be to the ordering service. This likely would return a CheckoutID if it wasn’t already defined by the client.

Workflow

Next, the client would send the credit card information to the payment service with the same CheckoutID.

Workflow

Finally, the client reviews their order by sending a request to the Ordering service to Place Order. The ordering service would make relevant state changes to its database and then publish an OrderPublished event or perhaps send a ProcessPayment command.

Workflow

The payment service would consume this message and then use the credit card data it already has within its database to charge the customer and hit a payment gateway.

Workflow

Shared Database

Can you share a database between services? Yes and no. You can share the physical database instance but not the schema and data. A service should own schema and data and only expose it via explicit contracts of an API or Messages. Don’t have data access free for all.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Workflow Orchestration for Resilient Systems

Building resilient software systems can be difficult, especially when they are distributed. Executing a long-running business process or workflow that involves multiple different service boundaries requires a lot of resiliency. Everything needs to be available and functioning because if something fails mid-way through, you can be left in an inconsistent state. What’s a solution? Workflow orchestration and removing direct service to service communication and temporal coupling.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Workflow

When working with a monolith and a single database, all database interactions are wrapped in a single transaction. Things are pretty straightforward.

Monolith

If there is any failure while executing, you simply roll back the transaction. If everything works correctly, the transaction is committed.

The problem arises when you have a set of distributed services involved in the workflow. You won’t likely have a distributed transaction, so there is no “rollback.” We can’t use the same mindset.

Service to Service

If there is service to service RPC calls, what happens when service D, far down the RPC call stack, fails? If any other services up the call stack did some type of state change or had some side-effect, how do they roll back? Each service needs to be aware of potential failures otherwise, they will be in an inconsistent state.

Using RPC for service-to-service communication can have many more issues, including latency. Check out my post REST APIs for Microservices? Beware!

Synchronous Orchestration

The first step is to remove the rat’s nets of service-to-service communication. This allows us to reason about a service; if it has any failures, they aren’t because of other services.

The first might be to create an orchestrator that handles all the interactions. Procedurally calling one service after the other.

Sync Orchestration

Once the first call to the Order service is made, the client (orchestrator) makes the next call to the Payment service.

Sync Orchestration

After the payment service returns, the client/orchestrator then makes a call to the Warehouse service. But what happens if the warehouse service call fails?

Sync Orchestration Failure

We’re in a similar situation where we need to back and call the payment service to try and revert or undo the previous call. In the case of the payment service, this may be to refund or void a transaction.

We haven’t accomplished much in making a synchronous orchestrator over service-to-service communication because, ultimately, the problem is that we are relying on every service to be available and operating without any failures for the entire workflow or business process.

We want to remove this temporal coupling by making each service independent, so they do not rely on each other and don’t need to be available simultaneously.

Asynchronous Workflow

You can accomplish asynchronous workflow orchestration by using messaging through commands and events allow us workflow to execute asynchronously. An orchestrator can consume events published by other services and then react to them by sending commands to the appropriate boundary to continue the workflow. The orchestrator directs the workflow.

To illustrate, when the initial Client requests the Ordering Service, it will make a state change to create an Order.

Workflow Orchestration

Ordering will then publish an OrderPlaced event that will kick off the workflow. Generally, I keep the orchestrator owned by the boundary that kicks it off, but this isn’t always the case.

Workflow Orchestration

Once the workflow kicks off, it will create a MakePayment command and send it to the message broker.

Workflow Orchestration

At this point, the Payment service doesn’t have to be available. This doesn’t break the workflow if it’s down for whatever reason. Once it’s available, it will process messages from the broker.

Workflow Orchestration

Once the Payment Service processes the command, it can publish an event or a reply. You can use the request-response pattern with asynchronous messaging. How? Check out my Asynchronous Request-Response Pattern for Non-Blocking Workflows

The Payment service sends a PaymentCompleted event back to the broker.

The Ordering service, which contains the orchestrator, will consume the PaymentCompleted message.

Workflow Orchestration

All the orchestrator is consuming the PaymentCompleted event to know that it must now create a CreateShippingLabel command and send it to the broker. There is no database calls or lookups in the orchestrator, it’s simply consuming events and sending commands.

Now the Warehouse service consumes the CreateShippingLabel command and then sends a ShippingLabelCreated Event back to the broker.

At this point, our workflow is complete, to which the orchestrator will consume the ShippingLabelCreated and mark the workflow as done.

Workflow Orchestration

Workflow orchestration can be accomplished by asynchronous messaging with commands and events, allowing you to execute long-running business processes and workflows and remove temporal coupling between services.

Each service can operate independently by consuming messages and sending events/replies once completed.

If a service isn’t available, it doesn’t break the entire workflow or business process. Once it comes back online, the workflow continues as the service starts consuming messages.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design