A new kind of database: Combining Postgres & Citus tables, feat. foreign keys

One of the main reasons people use Citus to transform Postgres into a distributed database is that with Citus, you can scale out horizontally while still enjoying PostgreSQL’s great RDBMS features. Whether you’re already a Postgres expert or are new to Postgres, you probably know one of the benefits of using a relational database is to have relations between your tables. And one of the ways you can relate your tables is of course to use foreign keys.

A foreign key ensures referential integrity, which can help you to avoid bugs in applications. For example, a foreign key can be used to ensure that a table of “orders” can only reference customer IDs that exist in the “customers” table.

If you have already heard about Citus 10, you know that Citus 10 gives you more support for hybrid data models, which means that you can easily combine regular Postgres tables with distributed Citus tables to get the best of the single node and distributed Postgres worlds.

This post will walk you through one of the new features in Citus 10: support for foreign keys between local Postgres tables and Citus reference tables.

Wait, what’s a reference table?

If you’re not yet familiar with how Citus works, Citus adds 2 new table types to Postgres to scale out your workload: distributed tables and reference tables.

Distributed tables are sharded across many PostgreSQL nodes to combine their compute capacity, and
Reference tables are replicated to enable fast joins with distributed tables.

However, distributing your tables does add some latency to your SQL queries, and might be unnecessary for some of your small tables. Fortunately, Citus allows you to mix regular (local) tables and distributed tables in the same database.

Foreign key support in Citus: before & after Citus 10

Before Citus 10, you could already use foreign keys with Citus in the following ways:

foreign keys from distributed tables to reference tables
foreign keys between reference tables
foreign keys between two colocated distributed tables when the key includes the distribution column

However, prior to Citus 10, defining foreign keys between your local Postgres tables and Citus reference tables wasn’t supported. Instead, if you wanted to use foreign keys between local Postgres tables and reference tables, you had to convert your local Postgres table to a reference table too. While reference tables are extremely useful, writes to reference tables are slower than writes to regular (local) tables since they are replicated to all nodes over the network.

As of Citus 10, you can now:

keep your regular Postgres tables local to coordinator, while still enjoying foreign keys between your regular PostgreSQL tables and reference tables.

So now you won’t need to incur any of the costs of network overhead that reference tables have, just because you want a local table to have a foreign key with a reference table. Of course, it still makes a lot of sense to convert some of your small tables to reference tables, if:

you want to make efficient joins with distributed tables, or
you want to have foreign keys from distributed tables into those small tables.

In those 2 cases, the network cost that we mentioned is already amortized by the benefits.

In this post, you will learn how Citus 10 extends its foreign key support and why extending the foreign key support for local tables is so useful for hybrid local+distributed data models.

Note that if you are using the Citus on Azure managed service, the good news is that Citus 10 is already available there. Update in August 2021: You can learn more about how to access the Citus 10 features in Hyperscale (Citus) across the different regions in Nik’s GA post. Update in October 2022: Citus has a new home on Azure. You can now find the Citus managed database service in Azure Cosmos DB for PostgreSQL.

Real-world example of using foreign keys between local & reference tables

Now let’s start with a real-world example—depicted in 4 separate steps—to show what you can do with Citus 10 and what a hybrid data model can look like in Citus 10. We will keep the table definitions to a minimum to keep the focus on foreign keys:

--  (Step-1) create (local) "companies" table
CREATE TABLE companies (
  id bigserial PRIMARY KEY,
  name text NOT NULL
);

--  (Step-2) create (local) "campaigns" table that has a foreign key to (local) "companies" table
CREATE TABLE campaigns (
  id bigserial,
  company_id bigint REFERENCES companies (id),
  name text NOT NULL,
  PRIMARY KEY (company_id, id)
);
-- (Step-3) create (reference) "ads" table that references to (local) "campaigns" table!
CREATE TABLE ads (
  id bigserial,
  company_id bigint,
  campaign_id bigint,
  name text NOT NULL,
  PRIMARY KEY (company_id, id),
  FOREIGN KEY (company_id, campaign_id)
    REFERENCES campaigns (company_id, id)
);
SELECT create_reference_table('ads');

-- (Step-4) create (distributed) "clicks" table that references to (reference) "ads" table
CREATE TABLE clicks (
  id bigserial,
  ad_id bigint,
  company_id bigint,
  PRIMARY KEY (company_id, id),
  FOREIGN KEY (company_id, ad_id)
    REFERENCES ads (company_id, id)
);
SELECT create_distributed_table('clicks', 'company_id');

In the example above, we distributed the clicks table and replicated the ads (reference) table to worker nodes. In addition, we also kept the other two Postgres tables as local to the coordinator, while still having the foreign key with the ads table.

Now let us go over above example to see which foreign key types were already supported before Citus 10, vs. introduced in Citus 10:

Step	Foreign Key support in Citus	Supported in Citus as of when
2	Between local Postgres tables	Already supported prior to Citus 10
3	Between local Postgres tables & reference tables	Supported as of Citus 10!
4	From a distributed table to a reference table	Already supported prior to Citus 10

Besides using CREATE TABLE commands to define foreign keys between PostgreSQL tables and reference tables, you can also use PostgreSQL’s powerful ALTER TABLE command to define such a foreign key, as you could do in PostgreSQL. Let us give another short example by using an ALTER TABLE command:

-- create "ads" as a reference table
CREATE TABLE ads (
  id bigserial,
  company_id bigint,
  campaign_id bigint,
  name text NOT NULL,
  PRIMARY KEY (company_id, id)
);
SELECT create_reference_table('ads');

-- at some point, you wanted to have a foreign key from
-- (reference) "ads" table to (local) "campaigns" table
ALTER TABLE ads ADD CONSTRAINT ads_campaigns_id
FOREIGN KEY (company_id, campaign_id) REFERENCES campaigns (company_id, id);

How to use foreign keys between reference tables & Postgres tables

To enable foreign keys between reference tables and local Postgres tables, you first need to have your reference tables replicated to the Citus coordinator node. To replicate reference tables to the coordinator, you need to add Citus coordinator to Citus metadata by making the following function call:

SELECT citus_add_node('<coordinator hostname or your ip address>', 5500, groupId => 0);

Once your reference tables have been replicated to the Citus coordinator, when you define foreign keys between reference tables and Postgres tables, Citus will ensure the referential integrity (read as foreign key constraints) via the coordinator replica of your reference tables.

Note that if you are using Citus on Azure managed service, then you don't need to make the citus_add_node function call, since Citus on Azure will already do that for you, assuming you’re using Citus 10. That means with Citus on Azure, all you need to do is define the foreign keys in whatever way you want!

Under the hood: some of the challenges we had to solve to create the new “foreign keys between local tables & reference tables” feature

Citus is able to ensure the foreign keys from distributed tables to reference tables via the replicas of the reference tables on each node (what we call ‘placements’ internally). That means, Citus defines additional foreign keys from your distributed table shards to reference table placements. Similarly, Citus will use the coordinator placement of your reference table to verify the foreign keys with local Postgres tables. That is why you should first have your reference table replicated to Citus coordinator node before defining foreign keys between Postgres tables and reference tables.

Other side of the coin is, how Citus will handle two tables of different kinds? That means, one side of the foreign key is a Citus table—a reference table—and so is a first-class Citus object—an object that is known & tracked by Citus metadata. However, other side of the foreign key is not a first-class object for Citus. At that point, maintaining the foreign key relationship between a Citus table and a local PostgreSQL table happens to be the challenge itself for Citus executor.

To overcome this difficulty, Citus now automatically adds your Postgres table to Citus metadata when you define a foreign key with your reference table. That way, your local Postgres table will be perceived as a first class Citus object too.

The way Citus adds a local Postgres table to the Citus metadata is by making your Postgres table a simple Citus table that has a single shard, which is local to coordinator. In addition, Citus defines a hidden foreign key between the coordinator placement of your reference table and the local (and only) placement of your local table.

foreign keys diagram — **Figure 1:** The data model depicted in this diagram shows the relationship between our local Postgres tables (campaigns & companies) and the reference table (ads). You can also see the placements I discuss above—Citus has seamlessly created one hidden shard for each of the local Postgres tables, since they have a foreign key relationship with the ads (reference) table.

Side effects of defining a foreign key between a PostgreSQL table and a reference table

Citus adds your local Postgres table to the Citus metadata in order to enable foreign keys from your local table to reference tables. However, this introduces some limitations to your local table. Specifically, as of Citus 10, the items below are not (yet!) supported for local tables that have foreign key relationships with reference tables:

ALTER TABLE commands with multiple subcommands
Identity columns
DMLs (Insert/Update/Delete etc.) with volatile functions
Partitioning and inheritance

Why these limitations? Because the Citus adaptive executor mostly follows the reference table infrastructure for your local table too, except that the 2PC—two phase commit—protocol will not be used for the modifications to your local table. Plus, your local Postgres table will not be replicated to worker nodes as expected since it’s still a local table. On the other hand, triggers on those local tables will still be supported after having the foreign key.

The good news is, if you ever want to get rid of those limitations, all you need to do is to remove the foreign key(s) with any reference tables from your local table. In addition to that, we will be working on lifting those limitation in future Citus releases.

Empowering hybrid local + distributed data models with Citus 10 and foreign keys

In conclusion, one of the new superpowers in Citus 10 is the ability to mix local and distributed tables—and this new support for foreign keys between reference tables and Postgres tables is part of that, along with another new feature: the ability to do JOINs between local and distributed tables. As a result, you can scale out CPU, memory, and I/O where you need it, and minimize application changes and other trade-offs where you don't. Now you can:

distribute some of your Postgres tables with Citus, while leaving some of your tables as local to the Citus coordinator
more easily migrate to Citus—since not every table needs to be sharded or replicated at day 1

And it’s easier than ever to get started with Citus. Here’s what gshulegaard recently said on Hacker News about using Citus:

“Most recently, I used Citus to implement a single (sharded) storage/warehouse for my current project. I have been very pleasantly surprised by how easy it was to create a hybrid data model which handles everything from OLTP single node data to auto-partitioned time series tables.”

If you are interested in diving deeper into Citus and our support for hybrid local & distributed tables, here are some useful links:

Getting Started, an awesome page suitable for every level of knowledge about Citus & learning preferences
Download page for you to start using the Citus extension today, on your preferred OS
Citus open source repo on GitHub to start exploring and contributing (♥) to Citus, and a place where you can make feature requests & report issues to help us make Citus better
Citus Docs for more info about using constraints & foreign keys in Citus
Postgres Docs as a complementary resource where you can learn more about foreign keys in Postgres

And you can also connect with our team and other Citus users on the Citus Slack.

Written by Onur Tirtir

Software engineer on the Postgres team at Microsoft. B.S. in Computer Engineering from Middle East Technical University. Likes to think about optimizing the things—Citus & random coding algorithms. Builds with LEGO. Loves cats.