Multi-Tenant: Database Per Tenant or Shared?

Sponsor: Using RabbitMQ or Azure Service Bus in your .NET systems? Well, you could just use their SDKs and roll your own serialization, routing, outbox, retries, and telemetry. I mean, seriously, how hard could it be?

When building a multi-tenant application, one of the first decisions revolves around data management: Should you use a shared database or a database per tenant?

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

https://www.youtube.com/watch?v=YaSPB2uNLYg

Multi-Tenant: Database Per Tenant or Shared? (https://www.youtube.com/watch?v=YaSPB2uNLYg)

Shared Database

Let’s start with the shared database model.

In this scenario, tenant A and tenant B are hitting the same API, connecting to the same database instance with the same schema. This means you need to record tenant information alongside the data you are persisting. For example, customer data might look like: tenant ID 1 owns customer ID 1 and 4, while tenant ID 2 owns customer ID 2 and 3.

If you’re using a relational database, I recommend having tenant information on every single table. This simplifies querying because you don’t have to join with other tables; the tenant information is always there.

However, there are pros and cons to this approach. The major pro is simplicity. You only have a single piece of infrastructure to manage. If you need to make a schema change, you do it once and release it together.

But the cons are significant. The first concern is the risk of accidentally leaking data. For instance, if tenant A accidentally sees data from tenant B, it can lead to serious issues. A solution to mitigate this is to add a filter to your queries. For example, using Entity Framework, you can set up a query filter that always includes the tenant information, ensuring that each query is automatically filtered by tenant.

The second con is the noisy neighbor problem.

If tenant A is making API calls and everything is fine, but tenant B suddenly ramps up usage, it can degrade performance for tenant A. One solution is to create a database user per tenant, which can limit the number of connections per tenant.

Additionally, implementing rate limiting at the application or load balancer level can help manage this issue, but it doesn’t completely solve the problem.

Database Per Tenant

Now, let’s consider the alternative: a database per tenant. In this model, tenant A hits their own specific database, and the same goes for tenant B. This eliminates the risk of data leakage and noisy neighbors because each tenant operates in isolation.

The pros here are substantial. You eliminate the risk of noisy neighbors and data leakage. However, the cons include increased complexity. Managing a database per tenant means that if you make a schema change, you have to apply it to every single tenant database, which can be a logistical nightmare. One way to manage this complexity is to isolate your API to a specific tenant. In this model, the API connects directly to the tenant’s specific database, but this can lead to a non-multi-tenant architecture if not handled carefully.

Hybrid Model

What about a hybrid model? This allows you to mix and match the two approaches. For instance, you could have some tenants using a shared database while others have their own dedicated databases. This flexibility can be beneficial, especially when managing different workloads and requirements. You can deploy schema changes to just the tenants using a shared database, while keeping others isolated.

However, this method can create significant complexity, especially if you need to manage tenant migrations between databases. I recommend using globally unique identifiers for customer IDs to prevent collisions when migrating tenants between databases.

Conclusion

When building a multi-tenant system, there are many factors to consider, such as the number of tenants, the volume of data, and the performance impact on your application. Each model has its own set of trade-offs, and your final choice will depend on your specific context and requirements. If you have experience with multi-tenant architectures, I’d love to hear about your decisions and the trade-offs you made. Let’s discuss in the comments!

Follow @CodeOpinion

Join CodeOpinon!
Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.