Schema-based sharding comes to PostgreSQL with Citus

Citus, a database scaling extension for PostgreSQL, is known for its ability to shard data tables and efficiently distribute workloads across multiple nodes. With Citus 12.0, Citus introduces a very exciting feature called schema-based sharding. The new schema-based sharding feature gives you a choice of how to distribute your data across a cluster, and for some data models (think: multi-tenant apps, microservices, etc.) this schema-based sharding approach may be significantly easier!

In this blog post, we will take a deep dive into the new schema-based sharding feature, and you will learn:

What schema-based sharding looks like
How we implemented schema-based sharding
The challenge of using PgBouncer with schema-based sharding and how we solved it
Combining schema-based sharding with citus_stat_tenants
How to use schema-based sharding with Django!

Closer look into the new schema-based sharding in Postgres

Citus 12.0 introduced a citus.enable_schema_based_sharding setting that allows you to shard your database based on schemas. When enabled, each newly-created schema will become a logical shard of your database, ensuring that all tables for a given tenant are stored on the same node. When combined with a schema-based multi-tenancy model, each tenant of your application gets mapped into a logical shard of your database that can operate independently. It’s worth emphasizing that while sharding your database with schemas, you don’t need to call create_distributed_table() function to create distributed tables for the tables within those distributed schemas. Let’s get started with creating two distributed schemas and some data tables for them:

-- Enable schema-based sharding in the session, or add it to postgresql.conf to enable it globally
SET citus.enable_schema_based_sharding TO ON;

-- Use regular "CREATE SCHEMA" commands to create two distributed schemas.
CREATE SCHEMA tenant_1;
CREATE SCHEMA tenant_2;

-- Create data tables for those two tenants.
--
-- Note that it's not necessary to keep citus.enable_schema_based_sharding setting enabled while
-- creating data tables because tenant_1 & tenant_2 are already saved as distributed schemas into
-- the Citus metadata.
--
-- Let's use regular "CREATE TABLE" commands to create two tables for our distributed schemas.
CREATE TABLE tenant_1.users (
  id int PRIMARY KEY,
  name text,
  email text
);
CREATE TABLE tenant_1.events (
  id INT PRIMARY KEY,
  name text,
  date date,
  user_id int REFERENCES tenant_1.users(id)
);
CREATE TABLE tenant_2.users (
  id int PRIMARY KEY,
  name text,
  email text
);
CREATE TABLE tenant_2.events (
  id int  PRIMARY KEY,
  name text,
  date date,
  user_id int REFERENCES tenant_2.users(id)
);

We can use citus_schemas—introduced in Citus 12.0—and citus_shards views to see total size of distributed schemas and where the distributed schema tables are stored in the cluster:

SELECT * FROM citus_schemas;
┌─────────────┬───────────────┬─────────────┬──────────────┐
│ schema_name │ colocation_id │ schema_size │ schema_owner │
├─────────────┼───────────────┼─────────────┼──────────────┤
│ tenant_1    │             1 │ 32 kB       │ onurctirtir  │
│ tenant_2    │             2 │ 32 kB       │ onurctirtir  │
└─────────────┴───────────────┴─────────────┴──────────────┘
(2 rows)

SELECT * FROM citus_shards WHERE citus_table_type = 'schema';
┌─────────────────┬─────────┬────────────────────────┬──────────────────┬───────────────┬─────────────┬──────────┬────────────┐
│   table_name    │ shardid │       shard_name       │ citus_table_type │ colocation_id │ nodename    │ nodeport │ shard_size │
├─────────────────┼─────────┼────────────────────────┼──────────────────┼───────────────┼─────────────┼──────────┼────────────┤
│ tenant_1.events │  103000 │ tenant_1.events_103000 │ schema           │             1 │ worker2host │     5432 │      16384 │
│ tenant_1.users  │  103001 │ tenant_1.users_103001  │ schema           │             1 │ worker2host │     5432 │      16384 │
│ tenant_2.events │  103002 │ tenant_2.events_103002 │ schema           │             2 │ worker1host │     5432 │      16384 │
│ tenant_2.users  │  103003 │ tenant_2.users_103003  │ schema           │             2 │ worker1host │     5432 │      16384 │
└─────────────────┴─────────┴────────────────────────┴──────────────────┴───────────────┴─────────────┴──────────┴────────────┘
(4 rows)

And the citus_tables view allows us to see how that new “single shard” table concept is utilized for distributed schema tables. In the query below, we can see that distributed schema tables have a single shard and they don’t have a shard key. Also, tables that belong to same schema share the same colocation id. As a result, all the tables that belong to same schema are automatically co-located to each other as single-shard tables. That way, cross table queries and operations—i.e., joins and foreign keys—within the schema will be handled on the node that stores your tables with minimal network roundtrip.

SELECT table_name, colocation_id, distribution_column, shard_count FROM citus_tables WHERE citus_table_type = 'schema';

   table_name    | colocation_id | distribution_column | shard_count
-----------------+---------------+---------------------+-------------
 tenant_1.events |             1 | <none>              |           1
 tenant_1.users  |             1 | <none>              |           1
 tenant_2.events |             2 | <none>              |           1
 tenant_2.users  |             2 | <none>              |           1
(4 rows)

That said, if you are sharding your PostgreSQL database by schemas, then cross-table operations—such as joins and foreign keys—should only target a single schema, or the Citus reference tables that you might have created in a regular schema—such as “public”:

-- Create "countries" table as a Citus reference table and add a "country_id" column to each tenant's
-- "users" table with a foreign key to "countries" table.
CREATE TABLE public.countries (
  id int PRIMARY KEY,
  name text UNIQUE
);

SELECT create_reference_table('public.countries');

ALTER TABLE tenant_1.users ADD COLUMN country_id INT REFERENCES public.countries(id);
ALTER TABLE tenant_2.users ADD COLUMN country_id INT REFERENCES public.countries(id);

You can then query the tenant tables as if you're using vanilla PostgreSQL. But to make sure that you’re querying the correct tenant, you either need to set the search_path to relevant tenant or need to schema-qualify the tables referenced in the query:

-- List distinct event dates for tenant_1.
SET search_path TO tenant_1;
SELECT DISTINCT(e.date) FROM users u JOIN events e ON (u.id = e.user_id);

-- Compute the number of users per country and return the country name along with the
-- corresponding user count.
SELECT c.name country_name, COUNT(*) user_count
FROM tenant_2.users u JOIN public.countries c ON (u.country_id = c.id) GROUP BY c.name;

Just by changing the search_path or the schema name in the query, the query will be transparently routed to a different node.

Under the covers: How we implemented schema-based sharding

Citus development is an evolutionary process, and we often build new features on top of the infrastructure introduced by other features, based on what we observe from users.

Citus has always supported distributed tables. The rows of distributed table are subdivided into shards based on the value in the distribution column. The shards of distributed tables can be moved around (“rebalanced”) and distributed tables can be added to a “co-location group”, which guarantees that shards with the same hash range (“shard groups”) are always placed on the same node. For good performance, it is important that queries filter and/or join by the distribution column, such that queries can be delegated to individual shard groups.

Citus also supports reference tables and can manage local tables on the coordinator. These tables do not have a distribution column, and cannot be moved around, or added to co-location groups. Since a single node has all the data, these tables mostly have the same performance and SQL characteristics as PostgreSQL without needing additional filters/joins. Additionally, distributed tables and local tables can be joined with reference tables without additional network overhead, since reference tables are replicated to all nodes.

With these table types, Citus users can distribute databases with very large tables. However, we found that some users have very large databases, but no extremely large tables, because they already subdivided their data into many tables or schemas. For instance, multi-tenant applications often use a separate schema, and thus a separate set of tables, per tenant. Typically, many queries are scoped to a single schema/tenant. If we could make each schema a separate shard group, then queries scoped to the schema could be delegated to wherever the shard group is stored, and we can distribute the schemas across the cluster. We can also still support joins with reference tables.

Schema-based sharding was built in two stages: First we introduced a new type of Citus table, which only has a single shard and no distribution column (like reference and local tables), but like a distributed table it can be added to a co-location group and moved around by the rebalancer. Second, we use PostgreSQL’s DDL hooks to automatically create a new colocation group for every schema, and every table added to the schema becomes a single shard table with a co-location group. That way, we cleanly extended existing infrastructure, while getting all the usability benefits of schema-based sharding.

PgBouncer and the search_path issue

A convenient way to switch between different schemas is to use unqualified names and use the search_path setting, but you will need be careful that search_path is a session-level parameter. This means that if your topology includes a connection pooler between the client and the Citus coordinator, it is important to ensure that session parameters work as expected. A pooler may share the same database session between multiple clients in a non-transparent way. In that case, the clients might unintentionally override each other’s session parameters.

If you are using PgBouncer with transaction pooling mode, pgbouncer 1.20.0 or above supports the use of session parameters in transaction pooling mode. By using the track_extra_parameters setting, you can configure PgBouncer in a way that your clients can retain arbitrary session parameters—such as search_path—in transaction pooling mode:

track_extra_parameters = search_path

When configured as above, pgbouncer propagates the value of search_path specified by the client (via SET search_path TO ..) when switching to a new server connection. However, Pgbouncer never parses queries, so it does not actually know when a SET command happens… How can pgbouncer know what the value of search_path is?

As it turns out, the PostgreSQL protocol as “parameter status” messages, which are emitted by postgres after a query (or SET!) when certain settings are changed, and pgbouncer can process those messages to get the new value. Unfortunately, search_path is not yet one of those settings, but PostgreSQL extensions can do a lot. Citus 12.0+ changes the search_path setting to also be emitted such that it can be propagated correctly by PgBouncer, so it automatically works for schema-based sharding with Citus.

Combining schema-based sharding with citus_stat_tenants

When you use schema-based sharding, you can continue benefiting from the powerful features of Citus like tenant statistics collection. Before querying citus_stat_tenants view, you need to make sure that you’ve already enabled tenant-level statistic collection before querying your distributed schemas, by setting citus.stat_tenants_track to 'all':

SELECT tenant_attribute, query_count_in_this_period FROM citus_stat_tenants;
┌──────────────────┬────────────────────────────┐
│ tenant_attribute │ query_count_in_this_period │
├──────────────────┼────────────────────────────┤
│ tenant1          │                          3 │
│ tenant2          │                          5 │
└──────────────────┴────────────────────────────┘
(2 rows)

If you use row-based sharding (hash-distributed tables), then the tenant_attribute reflects the value in the distribution column. When you use schema-based sharding, the schema name will appear as the tenant attribute and query counts and CPU usage are summed for the schema.

Rebalancing your schema-sharded cluster by using Citus shard-rebalancer

By default, Citus would try to distribute your tenant schemas as fairly as possible by following a simple round-robin based approach –places one tenant into first worker node and then places the next tenant into second worker and so on. However, this wouldn't always result in a fair distribution because one of your tenants can get bigger as in the above example. In that case, shard-rebalancer would help you to rebalance your tenant schemas across your Citus cluster, as you would traditionally do with your distributed tables.

-- Assume that later we've introduced a bigger tenant (tenant3) into the cluster in a way that it would
-- make more sense to group tenant1 & tenant2 into one worker node and tenant3 into other.
SELECT table_name, shardid, nodename, nodeport, shard_size FROM citus_shards WHERE citus_table_type = 'schema';
┌────────────────┬─────────┬─────────────┬──────────┬────────────┐
│   table_name   │ shardid │ nodename    │ nodeport │ shard_size │
├────────────────┼─────────┼─────────────┼──────────┼────────────┤
│ tenant1.events │  103000 │ worker2host │     5432 │      16384 │
│ tenant1.users  │  103001 │ worker2host │     5432 │      16384 │
│ tenant2.events │  103002 │ worker1host │     5432 │      16384 │
│ tenant2.users  │  103003 │ worker1host │     5432 │      16384 │
│ tenant3.events │  103004 │ worker1host │     5432 │      16384 │
│ tenant3.users  │  103005 │ worker1host │     5432 │    7217152 │
└────────────────┴─────────┴─────────────┴──────────┴────────────┘

(6 rows)

-- Start the shard-rebalancer and wait for a while.
SELECT citus_rebalance_start();

-- Then we can observe that data tables of tenant2 are moved to "worker2host" too.
SELECT table_name, shardid, nodename, nodeport, shard_size FROM citus_shards WHERE citus_table_type = 'schema';
┌────────────────┬─────────┬─────────────┬──────────┬────────────┐
│   table_name   │ shardid │ nodename    │ nodeport │ shard_size │
├────────────────┼─────────┼─────────────┼──────────┼────────────┤
│ tenant1.events │  103000 │ worker2host │     5432 │      16384 │
│ tenant1.users  │  103001 │ worker2host │     5432 │      16384 │
│ tenant2.events │  103002 │ worker2host │     5432 │      16384 │
│ tenant2.users  │  103003 │ worker2host │     5432 │      16384 │
│ tenant3.events │  103004 │ worker1host │     5432 │      16384 │
│ tenant3.users  │  103005 │ worker1host │     5432 │    7217152 │
└────────────────┴─────────┴─────────────┴──────────┴────────────┘
(6 rows)

Besides using shard-rebalancer, you can also choose to place a tenant into an arbitrary node by using citus_move_shard_placement() function—but keep in mind that if you run the rebalancer later, it might decide placing your schema into a different node:

-- Calling citus_move_shard_placement() for one of the shard placements within tenant_3 would move
-- the whole tenant into desired worker node.
SELECT citus_move_shard_placement(103004, 'worker1host', 5432, 'worker2host', 5432);

Enabling schema-based sharding for existing schemas

In the above examples, we have created schemas and data tables for tenants and Citus automatically distributed them because we’ve enabled citus.enable_schema_based_sharding setting before creating the schemas. However, if you forgot enabling schema-based sharding before creating one of your schemas, then you can use citus_schema_distribute() function:

SELECT citus_schema_distribute('my_local_tenant_schema');

Like what happens when you create a new tenant schema and data tables within that schema, citus_schema_distribute() function will convert the existing tables in the schema into tables that are guaranteed to be co-located to each other and that can be moved around in your Citus cluster. Also note that you don’t need to have citus.enable_schema_based_sharding setting enabled to use the citus_schema_distribute() function because citus.enable_schema_based_sharding setting is only about distributing the schemas that are just being created. Another important thing to note here is that you shouldn’t have distributed tables in the schema if you want to distribute it by using citus_schema_distribute() function.

And if you want to undistribute a distributed schema, then you can use citus_schema_undistribute() function. It's worth emphasizing that; to use citus_schema_undistribute() function, it doesn’t matter whether you’ve distributed the schema by using citus_schema_distribute() function or by setting citus.enable_schema_based_sharding setting prior to creating the schema:

SELECT citus_schema_undistribute('tenant_schema');

When you undistribute a schema, Citus will convert the distributed tables in the schema into regular PostgreSQL tables that are stored on the coordinator node. This means that after undistributing a schema, your data tables will only be accessible from the coordinator node while you can query distributed schema tables from any node.

You can also alter the schema of a distributed schema table into something else, e.g., if you’ve incorrectly created it in a different tenant’s schema, or in a regular schema:

-- Moves the data stored in "users" table into the node that stores data for tenant4.
ALTER TABLE tenant1.users SET SCHEMA tenant4;

-- Undistributes "events" table and moves it into "public" schema.
ALTER TABLE tenant1.events SET SCHEMA public;

-- Moves the data stored in "local_tenant_data" into the node that stores data for tenant1.
ALTER TABLE public.local_tenant_data SET SCHEMA tenant1;

django-tenants & distributed Postgres

Schema-based sharding is well suited for multi-tenant applications especially if your application represents your tenants as separate schemas, which is already what django-tenants does. When combined with schema-based sharding feature in Citus, your tenants would automatically become logical shards of your Citus cluster with little amount of application changes.

After you perform the migrations for your django applications, as in:

python manage.py makemigrations my_shared_app
..
python manage.py makemigrations my_tenant_app
...
python manage.py migrate

You can connect to your Citus coordinator and run the following commands to enable schema-based sharding for newly created tenants:

ALTER SYSTEM SET citus.enable_schema_based_sharding TO ON;
SELECT pg_reload_conf();

-- And convert the fact tables that would be referenced by tenant schemas to reference tables, if any.
SELECT create_reference_table('public.my_shared_app_tbl');

That's it! From now on, when a new tenant comes into your application, the tenant will automatically become a logical shard of your Citus cluster. And if you're not building a new application and want to start sharding your backing PostgreSQL database, you can use citus_schema_distribute() function to convert the schemas backing your existing tenants to distributed schemas.

It's also worth mentioning that; if you have a multi-tenant application that is more suitable for row-based sharding and your application runs in Django, we encourage you to use django-multitenant with Citus. However, if you already have an application that makes use of schema-based multi-tenancy model via django-tenants and you want to avoid application changes, then it would be more suitable to follow the above steps to seamlessly scale-out your application with Citus & django-tenants. And if you’re not familiar with django-tenants, you might want to follow django-tenants tutorial while keeping those a few SQL commands in mind.

It’s all up to your application’s needs

Schema-based sharding feature in Citus comes with many benefits, including but not limited to:

Ease of Use: By adopting schema-based sharding, Citus significantly simplifies the management of your database if you are building a multi-tenant application. With schema-based sharding, you no longer need to configure distribution columns, and this provides you with an easier way of migrating to Citus.
Efficient Resource Allocation: Schema-based sharding allows you to distribute your database across a Citus cluster while ensuring that data for a given tenant is stored on the same node, minimizing network overhead, and optimizing data locality for cross-table operations. This approach facilitates faster query execution and improves overall performance when compared to storing all your tenants in a single PostgreSQL database.
Scalability and Flexibility: Schema-based sharding strikes a balance between simplicity and scalability. It offers an ideal solution for applications that prioritize ease of use without sacrificing the ability to scale horizontally.

And even beyond the flexibility and ease-of use that comes with schema-based sharding, schema-based sharding also enables certain use-cases that were not possible with row-based sharding, such as:

Multi-tenancy with non-homogeneous schemas: If your tenants need to store different set of attributes or individual indexes, then the schema-based sharding would be a better fit than sharing tables across tenants.
Microservices: In a world where each microservice possesses its own small and straightforward state as a database schema, schema-based sharding allows microservices to efficiently manage and access the data, enable data sharing and transaction execution across microservices, all while reducing costs by sharing resources.

And as it’s mostly the case in distributed systems, the ease of use that schema-based sharding comes with a trade-off: row-based sharding can scale better when you have a large number of tenants. With millions of schemas, adding new tables or columns can become a significant challenge. Moreover, having lots of tables in the database imposes an overhead, in particular it increases the size of catalog caches.

If your application relies on row-based sharding and you want to continue benefiting from its powerful capabilities, you do not need to change anything in your application. Row-based sharding continues to be a useful and valid way to distribute your Postgres tables with sharding. But now, you have a new option—schema-based sharding—that is a good fit for multi-tenant applications, microservices, and even applications that do vertical partitioning. You can even choose to use both row-based sharding and schema-based sharding in the same Citus database cluster by keeping your existing row-based sharded tables in regular schemas and creating new data tables in distributed schemas. Or you might simply want to go with either method—it's all up to the needs of your application.

To dive deeper into Citus and explore its other capabilities, you might want to:

Visit the official Citus GitHub repository to access the source code, contribute, and collaborate with the community.
Join the Citus Slack channel for real-time discussions, questions, and support from Citus experts.
Explore the comprehensive Citus documentation.
Get started quickly with Citus by following the steps outlined in the Citus Getting Started guide.
Download the latest version of Citus.
Learn more about using Citus with Azure Cosmos DB by referring to the Citus on Azure documentation.

Written by Onur Tirtir

Software engineer on the Postgres team at Microsoft. B.S. in Computer Engineering from Middle East Technical University. Likes to think about optimizing the things—Citus & random coding algorithms. Builds with LEGO. Loves cats.