Citus Tips: JOINs between local & distributed Postgres tables

If you have a large PostgreSQL database that runs on a single node, eventually the single node’s resources—such as memory, CPU, and disk—may deliver query responses that are too slow. That is when you may want to use the Citus extension to Postgres to distribute your tables across a cluster of Postgres nodes.

In your large database, Citus will shine for large tables, since the distributed Citus tables will benefit from the memory across all of the nodes in the cluster. But what if your Postgres database also contains some small tables which easily fit into a single node’s memory? You might be wondering: do you need to distribute these smaller tables, even though there wouldn’t be much performance gain from distributing them?

Fortunately, as of the Citus 10 release, you do not have to choose: you can distribute your large tables across a Citus cluster and continue using your smaller tables as local Postgres tables on the Citus coordinator.

One of the new features in Citus 10 that enables you to use a hybrid “local+distributed” Postgres database is that you can now JOIN local tables and distributed tables. (The other new Citus 10 feature has to do with foreign keys between local and reference tables.)

In this blog post, you will learn all about how local and distributed table JOINs work in Citus and why these JOINs might be useful.

You don’t have to distribute all of your Postgres tables with Citus

With this added flexibility to JOIN distributed Citus tables with regular Postgres tables as of Citus 10, it’s now easier to migrate to Citus since you can:

Convert large tables into distributed tables
Convert smaller tables that frequently JOIN with distributed tables into reference tables
Convert smaller tables that have foreign keys from distributed tables into reference tables
Keep all other tables as regular PostgreSQL tables, local to the coordinator
Maintain foreign keys between local PostgreSQL tables and reference tables
Join remaining local tables with both reference tables and distributed tables

So you can scale out CPU, memory and I/O for the tables you need it—and minimize any changes to smaller tables (to add a distribution column, for example) if you don’t need to distribute them across the cluster.

How to use JOINs between Local and Distributed Tables?

By default, as of Citus 10, local and distributed table JOINs will work, so you don’t need to do anything extra to take advantage of this feature.

To distribute your Postgres tables, you can follow the suggestions in the Citus docs to choose the distribution column. The main improvement now is that you can use Citus to distribute some of your tables—while keeping your smaller tables as local tables when distributing doesn’t seem beneficial. And you don’t have to make the decision of which tables to distribute from day 1, you can distribute tables over time as you need to.

Oh, and if you’re using Citus on Azure, then this new JOINs-between-local-and-distributed-tables feature is also available there. Update in August 2021: You can learn more about how to access the Citus 10 features in Hyperscale (Citus) (now known as Azure Cosmos DB for PostgreSQL) across the different regions in Nik’s GA post.

Under the Hood: How did Citus make JOINs work between local and distributed Tables

Local tables are on the Citus coordinator while distributed tables are on the Citus worker nodes.

If you’re wondering how we make JOINs work between local and distributed tables, we have 2 ways of doing this¹:

Move distributed table data from the workers to the coordinator
Move local table data from the coordinator to the workers

Let’s see this with a few examples. For the 1st scenario, let’s create 2 Postgres tables, a local one and a distributed one. We’ll name the local table “local” and we’ll name the distributed table “distributed”.

CREATE TABLE local(id int, col int);
CREATE TABLE distributed(id int PRIMARY key, col int);
SELECT create_distributed_table('distributed','id');

Insert some data into the tables:

INSERT INTO local VALUES (1,1), (2,2), (3,3);
INSERT INTO distributed VALUES (1,1), (2,2), (3,3);

Now write a simple JOIN between them:

SELECT id FROM distributed JOIN local USING(id) WHERE local.id = 1;
DEBUG:  Wrapping relation "distributed" to a subquery
DEBUG:  generating subplan 5_1 for subquery SELECT id FROM public.distributed WHERE (id OPERATOR(pg_catalog.=) 1)
…
id
----
  1
(1 row)

“Wrapping relation to a subquery” in the code block above means that Citus constructs a subquery on the distributed table to filter the data on the workers, which is then evaluated separately. The result of the subquery is stored as an “intermediate result”, which is then JOINed with the local table on the coordinator.

Citus JOIN on coordinator node — **Figure 1:** The filters and projections are pushed down to the distributed tables on the Citus worker nodes in Step 1, and then the filtered data is moved to the Citus coordinator in Step 2. In Step 3, the JOIN is performed on the coordinator with the local table. Note that only the necessary data is fetched in the 2nd step, minimizing the network traffic.

So for the Postgres query in Figure 1 above, the execution flow is:

The filters and projections are pushed down to the distributed table on workers
The filtered data is moved from the workers to the Citus coordinator
Then the JOIN is performed with the local table on the coordinator

In terms of the performance, there are 2 things to note:

In the code block above, even though there wasn’t an explicit filter on the id column of the distributed table, the "distributed.id = 1" filter was applied by the local.id = 1 by transitivity. Why? Because we are JOINing on "distributed.id = local.id". As a result of using this "distributed.id = 1" filter, we move less data from the workers to the coordinator. (We sometimes call these types of filters on the worker nodes “pushdown filters”, similar to pushdown queries.)
Only the necessary column for the JOIN, id column, is chosen from the distributed table, minimizing network traffic since the filtered data size is smaller. (we refer to these as “pushdown projections”)

What if we had the filter on col column rather than the id column? For example, in the code block below and in Figure 2 below:

SELECT id FROM distributed JOIN local USING(id) WHERE local.col = 1;
DEBUG:  Wrapping relation "local" to a subquery
DEBUG:  generating subplan 10_1 for subquery SELECT id, col FROM public.local WHERE (col OPERATOR(pg_catalog.=) 1)
…
id
----
  1
(1 row)

Citus JOIN on worker nodes — **Figure 2:** The filters and projections are pushed down to the local table on the coordinator in Step 1, and the filtered data is then broadcasted to the relevant workers in Step 2. In Step 3, the JOIN is performed on the workers with the distributed table. Note that only the necessary data is sent to the workers, minimizing the network traffic.

In Figure 2, the flow is:

The filters and projections are pushed down to the local table on the coordinator
The filtered data is sent from the local table on the coordinator to the relevant workers
Then the JOIN is performed on the Citus worker nodes

So how does Citus decide if it should move the distributed table data from workers to the coordinator as in Figure 1 above, or pushdown the local table data from the coordinator to the workers to perform the JOIN on the workers, as in Figure 2?

How Citus decides what table to move

Moving data between coordinator and workers relies on network resources and moving data over network is significantly slower compared to fetching data from disk. Distributed Postgres tables are sometimes quite large, hence Citus decides to move filtered data from the distributed tables to the coordinator only if the filtered data of distributed table will be small. Why? In order to minimize data movement over network.

So, Citus will fetch the distributed table to coordinator only if:

The distributed table contains a unique column such as a primary key.
The unique column contains a constant equality filter either directly or by transitivity.

In the first example, the distributed table had a primary key and on the id column there was a constant equality filter by transitivity, hence the distributed table was filtered and fetched to support the JOIN.

How you can control the Citus handling of local and distributed JOINs

You can use a new Citus setting, citus.local_table_join_policy, to control the Citus behavior for local and distributed table JOINs.

Customizing the citus.local_table_join_policy setting can help reduce the amount of data sent between nodes when doing a JOIN between local and distributed tables.

Most of the time, you won’t need to change this setting and you will be fine with the default “auto” mode.

There are 4 modes:

Auto (default): Distributed table will be moved to coordinator if the distributed table contains a constant equality filter on a unique column, which ensures less data movement from workers to the coordinator. If not, then the local table will be moved from coordinator to workers.
Prefer-local: The filtered data of local table will be moved to the workers from the coordinator, then the JOIN will be executed on the workers.
Prefer-distributed: The filtered data of the distributed table will be moved to the coordinator from the workers, then the JOIN will be executed on the coordinator.
Never: Citus will give an error for local and distributed table JOINs, same as before Citus 10.

For example, if the selectivity of the filters on your distributed table is very high, then your distributed table might return just a few rows. If your local table returns lots of rows when filtering the data, it might be more efficient to move the filtered data of the distributed table over the network, compared to moving the filtered data of the local table. In this case, you can use ‘prefer-distributed’ mode.

You can also change the citus.local_table_join_policy setting for specific queries if you want to:

--- use "prefer-local" mode for just the following query
SET citus.local_table_join_policy TO "prefer-local";
SELECT id FROM distributed JOIN local USING(id) WHERE local.id= 1;
DEBUG:  Wrapping relation "local" to a subquery
…
id
----
  1
(1 row)
--- reset the setting
RESET citus.local_table_join_policy;

Easier to migrate to Citus now, with local & distributed table JOINs

As of Citus 10 it is easier to migrate to Citus because you can convert some of your tables to distributed and reference tables, while keeping the other tables as regular Postgres tables, local to the Citus coordinator.

With this new ability to do JOINs between local and distributed tables—combined with the ability to use foreign keys between local and reference tables—you can now mix PostgreSQL tables and Citus tables to get the best of both worlds, without having to separate them in your data model.

The Citus 10 release notes highlights all the new capabilities in Citus, including columnar compression and the ability to shard Postgres on a single node.

If you want to give Citus a try, it’s fairly easy to get started, just go ahead and download Citus to try some local and distributed table JOINs as well as some of the other cool Citus 10 features. You can also try Citus on a single node with docker. And if you have questions, there is lots of community Q&A on the Citus Slack—you can find me there.

Footnotes

The same outer JOIN limitations apply to the local tables as well. ↩

Written by Sait Talha Nisanci

Former software engineer at Microsoft with a passion for solving difficult problems—like distributed systems. 2nd in Turkey in Google Hash Code. Enjoys competitive programming and can ride a bike, control a drone, toss & catch a frisbee—at the same time. Coordinated.

You don’t have to distribute all of your Postgres tables with Citus

How to use JOINs between Local and Distributed Tables?

Under the Hood: How did Citus make JOINs work between local and distributed Tables

How Citus decides what table to move

How you can control the Citus handling of local and distributed JOINs

Easier to migrate to Citus now, with local & distributed table JOINs

Enjoy what you’re reading?

SHARE THIS POST

Related Content