Scaling out complex SQL transactions in multi-tenant apps on Postgres

Written by Marco Slot
June 2, 2017

Update in October 2022: The Citus managed database service is now available in Azure Cosmos DB for PostgreSQL. If you’re looking to distribute PostgreSQL in the cloud (for multi-tenant SaaS apps, or real-time analytics, or time series workloads), take a look at Azure Cosmos DB for PostgreSQL. And as always, the Citus database is also available as open source: you can find the Citus repo on GitHub or download Citus here.

Distributed databases often require you to give up SQL and ACID transactions as a trade-off for scale. Citus is a different kind of distributed database. As an extension to PostgreSQL, Citus can leverage PostgreSQL’s internal logic to distribute more sophisticated data models. If you’re building a multi-tenant application, Citus can transparently scale out the underlying database in a way that allows you to keep using advanced SQL queries and transaction blocks.

In multi-tenant applications, most data and queries are specific to a particular tenant. If all tables have a tenant ID column and are distributed by this column, and all queries filter by tenant ID, then Citus supports the full SQL functionality of PostgreSQL—including complex joins and transaction blocks—by transparently delegating each query to the node that stores the tenant’s data. This means that with Citus, you don’t lose any of the functionality or transactional guarantees that you are used to in PostgreSQL, even though your database has been transparently scaled out across many servers. In addition, you can manage your distributed database through parallel DDL, tenant isolation, high performance data loading, and cross-tenant queries.

Transparent query routing in Citus

When processing a query on a distributed table, Citus first invokes the regular PostgreSQL query planner. The planner builds internal lists of conditions on all the tables involved in the query, taking joins into account. PostgreSQL uses this logic to determine whether it can use an index to read from a table. Citus leverages this logic to determine which shards, and by extension which nodes, are involved in a query.

If the Citus query planner can tell from filters on the distribution key(s) that only a single node needs to be involved in a query, then Citus only has to rewrite the table names to shard names and route the query to the right node. In Citus, shards are just regular PostgreSQL tables, so the query on the shards can be handled by the regular PostgreSQL query planning logic.

The approach of transparently routing queries based on filters allows Citus to handle arbitrarily complex SQL queries and multi-statement ACID transactions on distributed tables, as long as each one can be delegated to a single node. Moreover, executing a single node query involves minimal network overhead since it can be sent over a previously established connection.

A real-world example: a multi-tenant TODO app

Let’s take an example app and see how this might all apply.

Imagine we’re building a TODO app, in which users (tenants) can create their own TODO lists. In the database we have a table of TODO lists, and a table of TODO items, and use the user_id column as the distribution key:

CREATE TABLE todo_lists (
  user_id bigint NOT NULL,
  list_id bigserial NOT NULL,
  list_name text NOT NULL,
  PRIMARY KEY (user_id, list_id)
);

CREATE TABLE todo_items (
  user_id bigint NOT NULL,
  list_id bigint NOT NULL,
  item_id bigserial NOT NULL,
  position int NOT NULL DEFAULT 0,
  description text NOT NULL DEFAULT '',
  created_at timestamptz NOT NULL DEFAULT now(),
  done bool NOT NULL DEFAULT false,
  type_id int,
  PRIMARY KEY (user_id, list_id, item_id)
);

-- Distribute tables by user_id using Citus
SELECT create_distributed_table('todo_lists', 'user_id');
SELECT create_distributed_table('todo_items', 'user_id');

INSERT INTO todo_lists (user_id, list_name) VALUES (1, 'work things');
INSERT INTO todo_items (user_id, list_id, description) VALUES (1, 1, 'write TODO blog post');
INSERT INTO todo_items (user_id, list_id, description) VALUES (1, 1, '???');
INSERT INTO todo_items (user_id, list_id, description) VALUES (1, 1, 'profit');

INSERT INTO todo_lists (user_id, list_name) VALUES (1, 'personal things');
INSERT INTO todo_items (user_id, list_id, description) VALUES (1, 2, 'go to work');

Now say we want to display the user’s TODO lists and order by the number of open items.

A natural way of getting this information from the database would be to do a subquery on todo_items to get the number of items per list and then join with todo_lists and order by the number of items.

SELECT list_id, list_name, num_items
FROM
  todo_lists lists
  JOIN
  (SELECT list_id, count(*) AS num_items FROM todo_items WHERE NOT done GROUP BY list_id) counts
  USING (list_id)
WHERE  user_id = 1
ORDER BY num_items DESC;

Beware that there is something subtly wrong with the query above. The subquery on todo_items does not filter by user_id = 1, nor does it join by user_id, which means that the subquery may need to inspect the TODO items of all users. If users have the same list ID, then the query could actually return results from other users (!). In addition, the resulting query plan will be inefficient even when using regular PostgreSQL tables since it cannot use the index.

When running the query on a distributed table, the Citus query planner concludes that the query cannot be distributed efficiently and throws the error below, but fortunately this is very easy to fix.

ERROR:  cannot pushdown the subquery since not all relations are joined using distribution keys
  • As a note: The type of distributed operations that require a large number of network round-trips are prohibitive for interactive applications. Citus typically errors out for such operations, rather than try to distribute them at a very high cost.

Enabling SQL query delegation via filters & joins on distribution keys

To make the TODO lists query work on Citus, we need to ensure that the query planner knows it only needs to query a single user in each subquery and the query can be delegated to a single node—meaning it can use all of PostgreSQL’s SQL features. The simplest way is to achieve this is to add a filter on the distribution key user_id to subqueries:

SELECT list_id, list_name, num_items
FROM
  todo_lists lists
  JOIN
  ( SELECT list_id, count(*) AS num_items FROM todo_items WHERE NOT done AND user_id = 1 GROUP BY list_id) counts
  USING (list_id)
WHERE  user_id = 1
ORDER BY num_items DESC;

 list_id |    list_name    | num_items
---------+-----------------+-----------
       1 | work things     |         3
       2 | personal things |         1
(2 rows)

Time: 2.024 ms

Another way to run the query on Citus is to always join on the distribution key, such that the filter can be derived from the join, which gives an equivalent query plan:

SELECT list_id, list_name, num_items
FROM
  todo_lists lists
  JOIN
  ( SELECT user_id, list_id, count(*) AS num_items FROM todo_items WHERE NOT done GROUP BY user_id, list_id) counts
  USING (user_id, list_id)
WHERE  user_id = 1
ORDER BY num_items DESC;

 list_id |    list_name    | num_items
---------+-----------------+-----------
       1 | work things     |         3
       2 | personal things |         1
(2 rows)

Time: 2.014 ms

By adding the right filters and/or joins, all SQL features can be used with Citus. Even without Citus, employing filters and joins is often a good idea since it makes your queries more efficient and more secure.

Running multi-statement transactions in Citus

By ensuring that queries always filter by tenant, Citus can also support transaction blocks with the same ACID guarantees as PostgreSQL. In our TODO example, we could use transaction blocks to reorder the list in a transactional way:

BEGIN;
UPDATE todo_items SET position = 2 WHERE user_id = 1 AND item_id = 1;
UPDATE todo_items SET position = 1 WHERE user_id = 1 AND item_id = 2;
COMMIT;

The main requirement for enabling transaction blocks is that all queries specify the same user_id value.

Avoid querying multiple nodes when querying a single node will do

Citus supports parallel analytical queries across all shards, which is powerful and has many applications. However, for simple lookups, it’s better to avoid the overhead of querying all shards by adding the right filters.

For example, an application might perform lookup queries such as:

SELECT item_id, description FROM todo_items WHERE list_id = 1 ORDER BY position;
 item_id |     description
---------+----------------------
       1 | write TODO blog post
       2 | ???
       3 | profit
(3 rows)

Time: 54.188 ms

Citus does not know which shard the list_id corresponds to, hence it will need to query all the shards. To do so, Citus opens multiple connections to every node and queries all the shards in parallel. In this case, querying all the shards in parallel is adding ~50ms of overhead—which is fine when you need to query a significant amount of data and you want the parallelism to get it done fast (the ~50ms of overhead is typically dwarfed by the overall size of the query). For small, frequent lookups, we recommend you always add a filter on the distribution key, such that Citus can route each query to a single node.

When migrating an existing application from PostgreSQL to Citus, you may have queries that don’t include a filter on the distribution key. Your queries that do not include a distribution key may silently work, but incur more overhead than they should. To find queries that don’t include distribution key filters, you can log multi-shard queries by setting citus.multi_task_query_log_level (new in Citus 6.2). For example:

SET citus.multi_task_query_log_level TO 'WARNING';

During testing, it is often a good idea to make multi-shard queries throw an error by setting the log level to ERROR:

SET citus.multi_task_query_log_level TO 'ERROR';
SELECT item_id, description FROM todo_items WHERE list_id = 1 ORDER BY position;
ERROR:  multi-task query about to be executed

After adding a user_id filter to the query, the query can be delegated to a single node and executes in under 2ms.

SELECT item_id, description FROM todo_items WHERE list_id = 1 AND user_id = 1 ORDER BY position;
 item_id |     description
---------+----------------------
       1 | write TODO blog post
       2 | ???
       3 | profit
(3 rows)

Time: 1.326 ms

Tip for when you do need to run single-tenant & multi-tenant queries

In some cases, you explicitly do want to run multi-shard queries, for example for cross-tenant analytics. It can be useful to use different configuration settings for different roles, for instance:

ALTER ROLE app_test SET citus.multi_task_query_log_level TO 'ERROR';
ALTER ROLE app_prod SET citus.multi_task_query_log_level TO 'WARNING';
ALTER ROLE analytics SET citus.multi_task_query_log_level TO off;

Roles allow you to differentiate the behaviour of the database for different applications. For a user-facing application, you can make sure that you don’t perform queries across multiple tenants by setting citus.multi_task_query_log_level for the database role. If you also have an internal analytics dashboard that does need to query all the data at once, then you can remove the restriction by using a different role.

In Citus Cloud, you can easily set up different roles in your Citus cluster through the dashboard.

Use reference tables for data shared across tenants

Some tables cannot be distributed by tenant because they contain data that are relevant across tenants. In that case, you can also create a reference table that is automatically replicated to all nodes. A reference table can be used in SQL queries without any extra filters.

For example, we can create a reference table for TODO types and get the number of TODO items by type for a given user:

CREATE TABLE todo_types (
  type_id bigserial NOT NULL,
  type_name text NOT NULL,
  PRIMARY KEY (type_id)
);
SELECT create_reference_table('todo_types');

SELECT type_name, count(*)
FROM
  todo_types
  LEFT JOIN
  todo_items
  USING (type_id)
WHERE user_id = 1
ORDER BY 2 DESC LIMIT 10;

Reference tables provide a convenient way to scale out more complex data models. They do have lower write performance because all writes need to be replicated to all nodes. If your table can be distributed by tenant ID then doing so is always preferred.

A multi-tenant database that does not require you to trade off SQL & ACID for scale

When using Citus, your application still talks to a Postgres server, and Citus handles queries by delegating work to a cluster of Postgres servers. Because Citus is an extension to Postgres (and not a fork), Citus supports the same functions, data types (e.g. JSONB), and extensions that are available in regular Postgres. It’s easy to migrate a multi-tenant application built on PostgreSQL to Citus, with only minor modifications to your schema and queries. Once you are using Citus, you can scale out horizontally to add as much memory, CPU and storage as your application requires.

Test driving Citus

If you want to test drive Citus, the quickest way to get started is to sign up for Citus Cloud, our fully-managed database as a service that is hosted on AWS.

In Citus Cloud, you can create Citus clusters with a configurable number of PostgreSQL 9.6 servers and high availability, backed of course by our team with their many years of experience managing millions of PostgreSQL databases.

If you have any questions about migrating your application to Citus, don’t hesitate to contact us, and be sure to check out our Citus documentation.

Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus