Lessons learned from Postgres schema sharding

Update in July 2023: Citus 12 allows you to shard your database by schema, or if you prefer, you can still use row-based sharding. Find out more in the Citus 12 blog post.

We talk with a number of Postgres users each week that are looking to scale out their database. First, we would never recommend scaling out until you truly have to, it’s always easier to scale your database up rather than out. It’s often not until over 100 GB of data that you need to think about sharding.

When you want to scale out though, you want it to be simple. For scaling a multi-tenant database, there’s three common approaches:

Create a database per tenant
Create a schema per tenant
Have all tenants share the database tables.

We’ve heard from users that have tried creating a database or schema per tenant. In both cases, Postgres databases and schemas make it extremely easy to separate your tenant data.

As you start to scale your database to hundreds and thousands of tenants however, you start running into challenges. First, you need to think about efficiently managing hardware and software resources across tenants. Second, most modern application frameworks have gone in another direction to help with scaling your database. They come with built-in connection pooling, and having the connection pool work across databases and schemas require extra work. Last, as you have more tenants, adding new tables, columns, or indexes can start to go from seconds to minutes to hours across thousands of different tenants.

Premature optimization is often a costly thing when building your early stage product, but even worse is an optimization that you may not be able to scale with for long. One of the more popular libraries for Rails, Apartment, took this schema based approach. They recently blogged about their results after a number of years using it in production. Their learnings from it are great guidance for anyone thinking about building a notion of multi-tenancy into your application from the start.

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.

SHARE THIS POST

Related Content