Avoiding the Repository Pattern with an ORM

For many years now I’ve advocated not using the repository pattern on top of an ORM such as Entity Framework. There are many reasons why that I’ll try and cover throughout this post based on ways that I’ve seen it implemented. Meaning, this post is talking about poorly implemented approaches or pitfalls that I’ve seen.

To clarify, since this topic seems to really fire people up, I’m not saying that you shouldn’t use the repository pattern. I’m going to clarify why I don’t think under certain situations it’s very useful and other situations that I do find it useful.

This post was spurred on by a blog post and tweet:

IQueryable

The first thing I’ve seen with a repository is exposing IQueryable<T> (or DbSet<T>) from the underlying DbContext in your repository. This serves no purpose. It’s not abstracting anything at all.

What’s even worst is the consumers/callers don’t necessarily know at what point will they actually be retrieving data (doing I/O), unless you’re aware that the underlying IQueryable is coming from Entity Framework., Now when you call a method that materializes your query and actually hits the database (such as ToListAsync()).

Lazy Loading

Second, to this point is now if you have any type of navigation properties and are accessing IQueryable<T> from repository consumers, you must either eager load (via Include()) or have your consumers do the Include() or not realize all navigation properties are lazy loading.

Again, consumers are now aware of the underlying implementation that is Entity Framework.

IEnumerable

To overcome these issues, usually what comes next is avoiding the IQueryable<T> by returning an IEnumerable<T>.

The issue now is since you’re taking away control from the consumer, you must decide what data to Include() and Select() behind query methods.

What this often turns into is a pile of methods with various filtering parameters that could have been much easier expressed via a LINQ expression against the DbSet directly.

Query Objects

So if I don’t generally use the repository pattern, what do I use? Query Objects.

For querying, I’d rather have specialized objects that can return very specific data for the given use case. When implementing in vertical feature slices, as opposed to layers, each query is responsible for how it retrieves data.

The simplest solution is to use the DbContext and query directly.

The primary benefit is query objects only have dependencies that they actually require. Because each query object defines its own dependencies, you can change those dependencies without affecting other query objects.

A simple example of this is if you wanted to migrate from Entity Framework 6 to Entity Framework Core. You could migrate one query object at a time to EF Core instead of having to change over an entire repository that is highly coupled.

Testing

I can see the argument for using a repository because testing was difficult with EF6. However, with EF Core using the SQLite or the InMemory Provider, testing is incredibly easy.

I’ve written a post on how to use the SQLite provider with an in-memory database.

Testing a Query Object becomes incredibly easy without the need to mock.

Caching

Another argument for using the repository pattern is being able to swap out the implementation for a “cached repository”. I do use this pattern but in very select cases. Most times this is across bounded context were cached or stale data is acceptable.

If you decide to swap out the implementation of your repository, which was previously always hitting the database (point of truth) and now is using a cache implementation, how does that affect the callers? How quickly is the data invalidated?

Data can be stale the moment you retrieve it from the database, however adding caching to your repository without your callers knowing it can have a big impact on behavior.

Aggregate Root

One place I do often use a repository is when accessing an aggregate (in DDD Terms). My repositories often only contain two methods, Get(id) and Save(aggregateRoot).

The reason I do use a repository in this situation is that my repository usually returns an object that encapsulates my EF data model. I want it to fetch the entire object model and construct the aggregate root. The aggregate root does not expose data but only behavior (methods) to change state.

Repository Pattern Related

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.

Focus on Service Capabilities, not Entities

Service Capabilities

One of the most common pitfalls I think I’ve fallen into is focusing too much on data entities rather than service capabilities. What tends to happen is building up a domain model of behaviors related to single entities.

As I’ve mentioned in my post about using language to find service boundaries, you can have the same entity that lives in a different context, but that owns specific behaviors and data.

This blog post is in a series. To catch up check out these other posts:

Entities

I’m not entirely sure where the focus on entities comes from. I suspect it the rise of ORMs has something to do with it as well as the general relational database table design.

In the world of monolithic applications and databases, it’s common to see a singular table that represents an entity. A massive Product table with 100 columns isn’t unusual.

What is unusual when living in a monolith is to think of that Product table being split up into multiple Product tables across multiple databases. This is a big mental leap.

Service Capabilities

But the reality is your application doesn’t often require all of the data related to an entity. Likely it needs very little of it. An exercise to see what it actually needs is by looking at the business logic related to a particular capability.

In our distribution example I’ve been using through this series, if I were to look at the Inventory Adjustment functionality in the Warehouse Service, do you think it requires the Product Selling Price?

An inventor adjustment is used to reconcile the deviation from what physically is in the warehouse for a product and what our system says. Why would there be a deviation from the system, well physical products sitting in a warehouse can be broke or be stolen. The real point of truth is the physical warehouse, not a number in a database.

As you might have guessed, an Inventory Adjustment doesn’t need the selling price.

What this illustrates is we don’t need to have a singular Product entity backed by a singular product table. We can separate these entities into multiple Product entities that live in various services.

What they will share is a common identifier, in our case a SKU to identify the product.

Once you start focusing on the behaviors and capabilities you can identify the data they encapsulate you can start splitting them into multiple entities across the services that own those behaviors.

Blog Series

More on all of these topics will be covered in greater length in other posts. If you have any questions or comments, please reach out to me in the comments section or on Twitter.

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.

Autonomous Services

Autonomous Services

In my previous post, I explored how words and language used by users of our system in our domain space can have different meaning based on their context. This establishes which services own which behavior and data. In this post, I’m going to explore why services are autonomous and how we can communicate between them

This blog post is in a series. To catch up check out these other posts:

Autonomy

Autonomy is the capacity to make an informed, uncoerced decision. Autonomous services are independent or self-governing. 

What does autonomy mean for services? A Service is the authority of a set of business capabilities. It doesn’t rely on other services.

We are constantly in a push/pull battle between coupling and cohesion. High coupling ultimately leads to the big ball of mud.

What’s unfortunate is the move to (micro)services with non-autonomous services that rely on RPC (usually via HTTP) hasn’t reduced coupling at all. It’s actually made the problem worse by introducing an unreliable network turning the big ball of mud into a distributed big ball of mud.

Prefer Messaging over RPC

We want services to be autonomous and not rely on other services over RPC to reduce coupling. One way to do this is to communicate state changes between our services with events.

When Service A has a state change, we publish that event to our message broker. Any other service can subscriber to that event and perform whatever action it needs to internally. The producer of the event (Service A) doesn’t care about who may consume that event.

Services that don’t Serve

This may seem completely counter-intuitive since the definition of a service is an act of assistance. However, an autonomous service does not want to assist other services synchronously via behaviors, rather exposing to other services things that have happened to it via asynchronous messaging.

An example of this in our distribution domain is in the form of Sales services and the quantity on hand of a product.

Does Sales need the quantity on hand of a product?

Sort of. You could assume without knowing this domain that you do not want to oversell. However, in my experience in distribution, overselling isn’t really a sales problem as it is a purchasing problem.

Sales want to know the quantity on hand of a product, as well as what has purchasing ordered from the vendor but has not yet been received. This is called Available to Promise (ATP) and is used by sales to determine if they can fulfill an order for a customer.

Another interesting point is related to Quantity on Hand. The quantity on hand that is owned by the warehouse service is still not really the point of truth for the real quantity on hand. Whatever the quantity on hand is for a product in a database isn’t the truth. The real truth is what’s physically in the warehouse. Products get damaged or stolen and aren’t immediately reflected in the system. This is why physical stock counts exist which end up as inventory adjustments in our warehouse service.

If we’re using RPC, for the Sales Service to calculate ATP it would need to make synchronous RPC to:

  • Purchasing Service to get what purchase orders have not yet been received.
  • Warehouse Service to get the quantity on hand.
  • Invoicing Service to determine what other orders have been placed but not yet shipped.

However, if we want our Sales Service to be autonomous it needs to manage ATP itself. It can do so by subscribing to the events of the other services.

Sales can manage it’s own ATP for a product subscribing to the various events. When a purchase order is placed it will increase the ATP. When inventory is adjusted it will increase or decrease the ATP. And finally when an order is invoiced it will decrease it’s ATP.

Blog Series

More on all of these topics will be covered in greater length in other posts. If you have any questions or comments, please reach out to me in the comments section or on Twitter.

Enjoy this post? Subscribe!

Subscribe to our weekly Newsletter and stay tuned.