For many years now I’ve advocated not using the repository pattern on top of an ORM such as Entity Framework. There are many reasons why that I’ll try and cover throughout this post based on ways that I’ve seen it implemented. Meaning, this post is talking about poorly implemented approaches or pitfalls that I’ve seen.
To clarify, since this topic seems to really fire people up, I’m not saying that you shouldn’t use the repository pattern. I’m going to clarify why I don’t think under certain situations it’s very useful and other situations that I do find it useful.
This post was spurred on by a blog post and tweet:
The first thing I’ve seen with a repository is exposing IQueryable<T> (or DbSet<T>) from the underlying DbContext in your repository. This serves no purpose. It’s not abstracting anything at all.
What’s even worst is the consumers/callers don’t necessarily know at what point will they actually be retrieving data (doing I/O), unless you’re aware that the underlying IQueryable is coming from Entity Framework., Now when you call a method that materializes your query and actually hits the database (such as ToListAsync()).
Second, to this point is now if you have any type of navigation properties and are accessing IQueryable<T> from repository consumers, you must either eager load (via Include()) or have your consumers do the Include() or not realize all navigation properties are lazy loading.
Again, consumers are now aware of the underlying implementation that is Entity Framework.
To overcome these issues, usually what comes next is avoiding the IQueryable<T> by returning an IEnumerable<T>.
The issue now is since you’re taking away control from the consumer, you must decide what data to Include() and Select() behind query methods.
What this often turns into is a pile of methods with various filtering parameters that could have been much easier expressed via a LINQ expression against the DbSet directly.
So if I don’t generally use the repository pattern, what do I use? Query Objects.
For querying, I’d rather have specialized objects that can return very specific data for the given use case. When implementing in vertical feature slices, as opposed to layers, each query is responsible for how it retrieves data.
The simplest solution is to use the DbContext and query directly.
The primary benefit is query objects only have dependencies that they actually require. Because each query object defines its own dependencies, you can change those dependencies without affecting other query objects.
A simple example of this is if you wanted to migrate from Entity Framework 6 to Entity Framework Core. You could migrate one query object at a time to EF Core instead of having to change over an entire repository that is highly coupled.
I can see the argument for using a repository because testing was difficult with EF6. However, with EF Core using the SQLite or the InMemory Provider, testing is incredibly easy.
I’ve written a post on how to use the SQLite provider with an in-memory database.
Testing a Query Object becomes incredibly easy without the need to mock.
Another argument for using the repository pattern is being able to swap out the implementation for a “cached repository”. I do use this pattern but in very select cases. Most times this is across bounded context were cached or stale data is acceptable.
If you decide to swap out the implementation of your repository, which was previously always hitting the database (point of truth) and now is using a cache implementation, how does that affect the callers? How quickly is the data invalidated?
Data can be stale the moment you retrieve it from the database, however adding caching to your repository without your callers knowing it can have a big impact on behavior.
One place I do often use a repository is when accessing an aggregate (in DDD Terms). My repositories often only contain two methods, Get(id) and Save(aggregateRoot).
The reason I do use a repository in this situation is that my repository usually returns an object that encapsulates my EF data model. I want it to fetch the entire object model and construct the aggregate root. The aggregate root does not expose data but only behavior (methods) to change state.