Citus 12.1 is out! Now with PG16 Support. Read all about it in Naisila’s 12.1 blog post. 💥
One of the main reasons people use Citus to transform Postgres into a distributed database is that with Citus, you can scale out horizontally while still enjoying PostgreSQL’s great RDBMS features. Whether you’re already a Postgres expert or are new to Postgres, you probably know one of the benefits of using a relational database is to have relations between your tables. And one of the ways you can relate your tables is of course to use foreign keys.
A foreign key ensures referential integrity, which can help you to avoid bugs in applications. For example, a foreign key can be used to ensure that a table of “orders” can only reference customer IDs that exist in the “customers” table.
If you have already heard about Citus 10, you know that Citus 10 gives you more support for hybrid data models, which means that you can easily combine regular Postgres tables with distributed Citus tables to get the best of the single node and distributed Postgres worlds.
This post will walk you through one of the new features in Citus 10: support for foreign keys between local Postgres tables and Citus reference tables.
If you’re not yet familiar with how Citus works, Citus adds 2 new table types to Postgres to scale out your workload: distributed tables and reference tables.
However, distributing your tables does add some latency to your SQL queries, and might be unnecessary for some of your small tables. Fortunately, Citus allows you to mix regular (local) tables and distributed tables in the same database.
Before Citus 10, you could already use foreign keys with Citus in the following ways:
However, prior to Citus 10, defining foreign keys between your local Postgres tables and Citus reference tables wasn’t supported. Instead, if you wanted to use foreign keys between local Postgres tables and reference tables, you had to convert your local Postgres table to a reference table too. While reference tables are extremely useful, writes to reference tables are slower than writes to regular (local) tables since they are replicated to all nodes over the network.
As of Citus 10, you can now:
So now you won’t need to incur any of the costs of network overhead that reference tables have, just because you want a local table to have a foreign key with a reference table. Of course, it still makes a lot of sense to convert some of your small tables to reference tables, if:
In those 2 cases, the network cost that we mentioned is already amortized by the benefits.
In this post, you will learn how Citus 10 extends its foreign key support and why extending the foreign key support for local tables is so useful for hybrid local+distributed data models.
Note that if you are using the Citus on Azure managed service, the good news is that Citus 10 is already available there. Update in August 2021: You can learn more about how to access the Citus 10 features in Hyperscale (Citus) across the different regions in Nik’s GA post. Update in October 2022: Citus has a new home on Azure. You can now find the Citus managed database service in Azure Cosmos DB for PostgreSQL.
Now let’s start with a real-world example—depicted in 4 separate steps—to show what you can do with Citus 10 and what a hybrid data model can look like in Citus 10. We will keep the table definitions to a minimum to keep the focus on foreign keys:
-- (Step-1) create (local) "companies" table CREATE TABLE companies ( id bigserial PRIMARY KEY, name text NOT NULL ); -- (Step-2) create (local) "campaigns" table that has a foreign key to (local) "companies" table CREATE TABLE campaigns ( id bigserial, company_id bigint REFERENCES companies (id), name text NOT NULL, PRIMARY KEY (company_id, id) ); -- (Step-3) create (reference) "ads" table that references to (local) "campaigns" table! CREATE TABLE ads ( id bigserial, company_id bigint, campaign_id bigint, name text NOT NULL, PRIMARY KEY (company_id, id), FOREIGN KEY (company_id, campaign_id) REFERENCES campaigns (company_id, id) ); SELECT create_reference_table('ads'); -- (Step-4) create (distributed) "clicks" table that references to (reference) "ads" table CREATE TABLE clicks ( id bigserial, ad_id bigint, company_id bigint, PRIMARY KEY (company_id, id), FOREIGN KEY (company_id, ad_id) REFERENCES ads (company_id, id) ); SELECT create_distributed_table('clicks', 'company_id');
In the example above, we distributed the
clicks table and replicated the
ads (reference) table to worker nodes. In addition, we also kept the other two Postgres tables as local to the coordinator, while still having the foreign key with the
Now let us go over above example to see which foreign key types were already supported before Citus 10, vs. introduced in Citus 10:
|Step||Foreign Key support in Citus||Supported in Citus as of when|
|2||Between local Postgres tables||Already supported prior to Citus 10|
|3||Between local Postgres tables & reference tables||Supported as of Citus 10!|
|4||From a distributed table to a reference table||Already supported prior to Citus 10|
Besides using CREATE TABLE commands to define foreign keys between PostgreSQL tables and reference tables, you can also use PostgreSQL’s powerful ALTER TABLE command to define such a foreign key, as you could do in PostgreSQL. Let us give another short example by using an
ALTER TABLE command:
-- create "ads" as a reference table CREATE TABLE ads ( id bigserial, company_id bigint, campaign_id bigint, name text NOT NULL, PRIMARY KEY (company_id, id) ); SELECT create_reference_table('ads'); -- at some point, you wanted to have a foreign key from -- (reference) "ads" table to (local) "campaigns" table ALTER TABLE ads ADD CONSTRAINT ads_campaigns_id FOREIGN KEY (company_id, campaign_id) REFERENCES campaigns (company_id, id);
To enable foreign keys between reference tables and local Postgres tables, you first need to have your reference tables replicated to the Citus coordinator node. To replicate reference tables to the coordinator, you need to add Citus coordinator to Citus metadata by making the following function call:
SELECT citus_add_node('<coordinator hostname or your ip address>', 5500, groupId => 0);
Once your reference tables have been replicated to the Citus coordinator, when you define foreign keys between reference tables and Postgres tables, Citus will ensure the referential integrity (read as foreign key constraints) via the coordinator replica of your reference tables.
Note that if you are using Citus on Azure managed service, then you don't need to make the citus_add_node function call, since Citus on Azure will already do that for you, assuming you’re using Citus 10. That means with Citus on Azure, all you need to do is define the foreign keys in whatever way you want!
Citus is able to ensure the foreign keys from distributed tables to reference tables via the replicas of the reference tables on each node (what we call ‘placements’ internally). That means, Citus defines additional foreign keys from your distributed table shards to reference table placements. Similarly, Citus will use the coordinator placement of your reference table to verify the foreign keys with local Postgres tables. That is why you should first have your reference table replicated to Citus coordinator node before defining foreign keys between Postgres tables and reference tables.
Other side of the coin is, how Citus will handle two tables of different kinds? That means, one side of the foreign key is a Citus table—a reference table—and so is a first-class Citus object—an object that is known & tracked by Citus metadata. However, other side of the foreign key is not a first-class object for Citus. At that point, maintaining the foreign key relationship between a Citus table and a local PostgreSQL table happens to be the challenge itself for Citus executor.
To overcome this difficulty, Citus now automatically adds your Postgres table to Citus metadata when you define a foreign key with your reference table. That way, your local Postgres table will be perceived as a first class Citus object too.
The way Citus adds a local Postgres table to the Citus metadata is by making your Postgres table a simple Citus table that has a single shard, which is local to coordinator. In addition, Citus defines a hidden foreign key between the coordinator placement of your reference table and the local (and only) placement of your local table.
Citus adds your local Postgres table to the Citus metadata in order to enable foreign keys from your local table to reference tables. However, this introduces some limitations to your local table. Specifically, as of Citus 10, the items below are not (yet!) supported for local tables that have foreign key relationships with reference tables:
Why these limitations? Because the Citus adaptive executor mostly follows the reference table infrastructure for your local table too, except that the 2PC—two phase commit—protocol will not be used for the modifications to your local table. Plus, your local Postgres table will not be replicated to worker nodes as expected since it’s still a local table. On the other hand, triggers on those local tables will still be supported after having the foreign key.
The good news is, if you ever want to get rid of those limitations, all you need to do is to remove the foreign key(s) with any reference tables from your local table. In addition to that, we will be working on lifting those limitation in future Citus releases.
In conclusion, one of the new superpowers in Citus 10 is the ability to mix local and distributed tables—and this new support for foreign keys between reference tables and Postgres tables is part of that, along with another new feature: the ability to do JOINs between local and distributed tables. As a result, you can scale out CPU, memory, and I/O where you need it, and minimize application changes and other trade-offs where you don't. Now you can:
And it’s easier than ever to get started with Citus. Here’s what gshulegaard recently said on Hacker News about using Citus:
“Most recently, I used Citus to implement a single (sharded) storage/warehouse for my current project. I have been very pleasantly surprised by how easy it was to create a hybrid data model which handles everything from OLTP single node data to auto-partitioned time series tables.”
If you are interested in diving deeper into Citus and our support for hybrid local & distributed tables, here are some useful links:
And you can also connect with our team and other Citus users on the Citus Slack.