Lessons learned from Postgres schema sharding

Craig Kerstiens Dec 18, 2016

About Citus

Citus is a distributed database that scales out PostgreSQL. Citus scales your multi-tenant database to 100K+ tenants or enable real-time analytics on large volumes of data.

Stay subscribed

Enjoy what you're reading? Sign-up to our newsletter to stay informed:

Other Recent Posts

Citus 6.1 Released–Horizontally scale your Postgres database Setting up your log destination on Citus Cloud Yubikeys and U2F make two-factor authentication easier More Articles

Like our blog, or have a question about Citus? Join us on Slack for a chat :)

We talk with a number of Postgres users each week that are looking to scale out their database. First, we would never recommend scaling out until you truly have to, it’s always easier to scale your database up rather than out. It’s often not until over 100 GB of data that you need to think about sharding.

When you want to scale out though, you want it to be simple. For scaling a multi-tenant database, there’s three common approaches:

  • Create a database per tenant
  • Create a schema per tenant
  • Have all tenants share the database tables.

We’ve heard from users that have tried creating a database or schema per tenant. In both cases, Postgres databases and schemas make it extremely easy to separate your tenant data.

As you start to scale your database to hundreds and thousands of tenants however, you start running into challenges. First, you need to think about efficiently managing hardware and software resources across tenants. Second, most modern application frameworks have gone in another direction to help with scaling your database. They come with built-in connection pooling, and having the connection pool work across databases and schemas require extra work. Last, as you have more tenants, adding new tables, columns, or indexes can start to go from seconds to minutes to hours across thousands of different tenants.

Premature optimization is often a costly thing when building your early stage product, but even worse is an optimization that you may not be able to scale with for long. One of the more popular libraries for Rails, Apartment, took this schema based approach. They recently blogged about their results after a number of years using it in production. Their learnings from it are great guidance for anyone thinking about building a notion of multi-tenancy into your application from the start.

← Next article Previous article →