v11.0 beta (Mar 2022)

What’s new in Citus 11.0-beta?

Welcome to release 11.0-beta! the Citus extension to PostgreSQL. Explore these headline features of Citus 11.0-beta:

Automatic metadata syncing

The easiest way to use Citus is to connect to the coordinator and use it for both schema changes and distributed queries—as most of you have been doing prior to 11.0 beta—but for very demanding applications, you now have the option to load balance distributed queries across the workers in (parts of) your application by using a different connection string and factoring a few limitations.

Load balancing distributed queries across nodes are enabled by automatic metadata syncing to the nodes. All the code changes that enabled automatic metadata syncing is tracked on #5304. Throughout this release notes, we’ll dive into different parts of the changes.

Citus 11.0 beta changes to a new mode of operation by default: All Citus database clusters always use metadata syncing. That means that with Citus 11.0, you can always run distributed Postgres queries, from any node.

-- create a table, distribute it on the coordinator and
-- ingest a 3 rows for demo purposes
CREATE TABLE distributed_table(key int, value text);
SELECT create_distributed_table('distributed_table' ,'key');

INSERT INTO distributed_table VALUES (1, '1'), (2, '2'), (3, '3');
INSERT 0 3

Now, connect to any node, and run queries:

SELECT count(*) FROM distributed_table;
┌───────┐
 count 
├───────┤
     3 
└───────┘
(1 row)

INSERT INTO distributed_table VALUES (4, '4'), (5, '5'), (6, '6');
INSERT 0 3

All commands (SELECT, INSERT, UPDATE/DELETE or TRUNCATE) can be executed from any node on any tables (distributed tables, reference tables or citus managed local tables). All DDL commands, like ALTER TABLE, CREATE INDEX or create_distributed_table, should still happen via the coordinator.

To enable running queries via any node, Citus propagates all its metadata (pg_dist_* tables) to the nodes in the database cluster. Some of the metadata tables were already propagated by earlier versions of Citus. Citus 11.0-beta enabled propagation of pg_dist_object and pg_dist_colocation.

Upgrading an existing Citus database cluster to Citus 11.0 beta

If you are upgrading an existing cluster to Citus 11.0 beta, then after installing the new packages, you will need to call a function to finalize the upgrade. This has been implemented on #5556.

Note that we do not suggest upgrading production clusters Citus 11.0 beta. We have similar perspective on the beta release as Postgres commnuity. Citus beta releases are not meant for use in production systems.

-- on all nodes
ALTER EXTENSION citus UPDATE;

And, then on the coordinator:

-- only on the coordinator
select citus_finalize_upgrade_to_citus11();

The upgrade function will make sure that all worker nodes have the right schema and metadata. It also addresses several naming issues that affected shards of partitioned tables.

If there is anything preventing metadata from being synchronized (e.g. missing permissions on worker nodes, or conflicting objects) the upgrade function will throw an error. You can still use your existing Citus database cluster via the coordinator until you address the issue and finalize the upgrade, but some new 11.0-beta functionality will not be available.

Connection management for high query throughput

If you need to perform many queries per second on a distributed PostgreSQL database, it may be necessary to use all nodes in the cluster for running distributed queries. To do that, you need a relatively large number of connections.

When your application opens a connection to one of the Citus nodes, that connection spawns a Postgres process, which will need to make internal connections to other nodes to perform distributed queries. If you connect directly to the worker nodes for distributed queries, your client connections are in a sense competing with internal connections. To make sure that every Postgres process resulting from a client connection can also make internal connections to all other nodes, we added the citus.max_client_connections setting via #5698. This setting restricts the number of (external) client connections, while continuing to allow internal connections between Citus nodes. In addition to the usual installation instructions, we recommend adding the following settings to postgresql.conf on each Citus node (coordinator and all workers) to accommodate a large number of client connections:

# The maximum number of client + internal connections a node can handle
# The total number of client connections across all nodes should never exceed this number
max_connections = 6000

# The number of client connections an individual node can handle
# Should be no greater than: max_connections / node count including the coordinator
citus.max_client_connections = 500

With these settings, each node will accept up to 500 connections from your application, so if you have 10 worker nodes and 1 coordinator your application can make 5500 connections in total. You can increase this number further by using a connection pooler like pgbouncer on each node.

We also highly recommend adding the Citus coordinator to the metadata, such that worker nodes can connect to the coordinator as well. Some Citus features will only be available if the coordinator is in the metadata. We will likely make adding the coordinator required in the future.

Cluster insights in the metadata synced clusters

When there is a single node in the cluster for running distributed queries, the cluster insights can be extracted pretty much simiilar to single node Postgres. You can use pg_stat_activity or pid(process id) to gain an insight of the current activity. Similarly, you can use pg_cancel_backend / pg_terminate_backend to have a further control over the activities.

When you use multiple nodes for querying the Citus cluster, gaining insights could become more complicated. We introduced several tools to alleviate potential difficulties the users can have.

  • citus_check_cluster_node_health() UDF: Any node should be able to connect to any other node in the cluster. You can use citus_check_cluster_node_health() UDF to check if there are any connectivity issues.
 from_nodename | from_nodeport | to_nodename | to_nodeport | result
--------------------------------------------------------------------
 localhost     |          9700 | localhost   |        9700 | t
 localhost     |          9700 | localhost   |        9701 | t
 localhost     |          9701 | localhost   |        9700 | t
 localhost     |          9701 | localhost   |        9701 | t
(4 rows)
  • citus_backend_gpid() UDF: Any process that initiates a distributed query could create some internal connections. With Citus 11.0-beta, we introduce global_pid (or gpid), which uniquely identifies a client session and all internal connections associated with the session via #5601 and #5755. The gpid starts with the node ID of the node that initiated the query (= the node that the client connected to).
select citus_backend_gpid();
┌────────────────────┐
 citus_backend_gpid 
├────────────────────┤
        10000059518 
└────────────────────┘
(1 row)

Note that the internal connections also gets the same unique global pid of the session that initiated the distributed query.

select key, citus_backend_gpid() FROM distributed_table;
┌─────┬────────────────────┐
 key  citus_backend_gpid 
├─────┼────────────────────┤
   1         10000059518 
   5         10000059518 
   4         10000059518 
   3         10000059518 
   6         10000059518 
   2         10000059518 
└─────┴────────────────────┘
(6 rows)
  • citus_stat_activity View: Shows the information from pg_stat_activity from all client sessions across all nodes, along with a global_pid (or gpid). It includes all the internal backends as well. This change is introduced via #5731.
SELECT global_pid, query, state from citus_stat_activity WHERE query ilike 'INSER%' and global_pid != citus_backend_gpid();
┌─────────────┬────────────────────────────────────────────────────────────────────────────────┬─────────────────────┐
 global_pid                                       query                                              state        
├─────────────┼────────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
 80000062731  INSERT INTO distributed_table VALUES (1, '1');                                  idle in transaction 
 80000062731  INSERT INTO public.distributed_table_102009 (key, value) VALUES (1, '1'::text)  idle in transaction 
└─────────────┴────────────────────────────────────────────────────────────────────────────────┴─────────────────────┘
(2 rows)
  • citus_dist_stat_activity View: Similar to the citus_stat_activity but excludes the internal backends. Most of the time, this is what you would probably need. This change is introduced via #5760.
SELECT global_pid, query, state from citus_dist_stat_activity WHERE query ilike 'INSER%' and global_pid != citus_backend_gpid();
┌─────────────┬────────────────────────────────────────────────┬─────────────────────┐
 global_pid                       query                              state        
├─────────────┼────────────────────────────────────────────────┼─────────────────────┤
 80000062731  INSERT INTO distributed_table VALUES (1, '1');  idle in transaction 
└─────────────┴────────────────────────────────────────────────┴─────────────────────┘
(1 row)
  • citus_lock_waits View: In single node Postgres, there is a useful query which shows blocking/blocked activities. At Citus, we had already provided citus_lock_waits which gives the same information, but considering all the activity across the cluster. With Citus 11.0-beta, we improved this view to show any blocking activitiy and with gpids attached. This change is introduced via #5766.
SELECT * FROM citus_lock_waits;
┌─[ RECORD 1 ]──────────────────────────┬────────────────────────────────────────────────┐
 waiting_gpid                           10000062689                                    
 blocking_gpid                          80000062731                                    
 blocked_statement                      TRUNCATE distributed_table;                    
 current_statement_in_blocking_process  INSERT INTO distributed_table VALUES (1, '1'); 
 waiting_nodeid                         1                                              
 blocking_nodeid                        8                                              
└───────────────────────────────────────┴────────────────────────────────────────────────┘

Time: 76.163 ms
  • Overriden pg_cancel_backend() and pg_terminate_backend(): Postgres allows cancelling and/or terminating processes with a pid. On a distributed Citus cluster, we want to preserve the same ability with as little effort as possible. Given that we have global pids, we overriden these UDFs to accept gpids. For simplicty, below we show using the current backends gpid with the UDFs. In general, you can pass gpids from any node.
-- cancel own backend via gpid
SELECT pg_cancel_backend(citus_backend_gpid());
ERROR:  canceling statement due to user request

-- terminatee own backend via global_pid
SELECT pg_terminate_backend(citus_backend_gpid());
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

These changes are introduced in Pull Request #5699.

Controlling shard visibility

With Citus 11.0-beta, users can connect to any node to query their distributed tables. We are getting reports from users who are understandbly confused by seeing Citus-generated shards in their schemas next to their distributed tables when inspecting the database with tools like pgAdmin.

To avoid the confusion, Citus 11.0-beta by default hides all the shards from all external applications. This is controlled by a GUC citus.hide_shards_from_app_name_prefixes.

This change is introduced via Pull request #5567.

 -- * means that hide shards from all applications
 show citus.hide_shards_from_app_name_prefixes;
┌──────────────────────────────────────────┐
 citus.hide_shards_from_app_name_prefixes 
├──────────────────────────────────────────┤
 *                                        
└──────────────────────────────────────────┘
(1 row)

 \d
                 List of relations
┌────────┬───────────────────┬───────┬─────────────┐
 Schema        Name         Type      Owner    
├────────┼───────────────────┼───────┼─────────────┤
 public  citus_tables       view   onderkalaci 
 public  distributed_table  table  onderkalaci 
└────────┴───────────────────┴───────┴─────────────┘
(2 rows)

-- now configure the GUC so that it only hides shards from a specific application
-- and then, on psql -- where we run the test -- we can see the shards

SET citus.hide_shards_from_app_name_prefixes TO 'non_existing_app';
\d
                     List of relations
┌────────┬──────────────────────────┬───────┬─────────────┐
 Schema            Name            Type      Owner    
├────────┼──────────────────────────┼───────┼─────────────┤
 public  citus_tables              view   onderkalaci 
 public  distributed_table         table  onderkalaci 
 public  distributed_table_102008  table  onderkalaci 
 public  distributed_table_102015  table  onderkalaci 
 public  distributed_table_102022  table  onderkalaci 
 public  distributed_table_102029  table  onderkalaci 
 public  distributed_table_102036  table  onderkalaci 
└────────┴──────────────────────────┴───────┴─────────────┘
(7 rows)

Note that, if you want to make all the shards visible, you can do that via citus.override_table_visibility:

-- do not hide any tables (a.k.a., shards)
SET citus.override_table_visibility TO off;
\d
                     List of relations
┌────────┬──────────────────────────┬───────┬─────────────┐
 Schema            Name            Type      Owner    
├────────┼──────────────────────────┼───────┼─────────────┤
 public  citus_tables              view   onderkalaci 
 public  distributed_table         table  onderkalaci 
 public  distributed_table_102008  table  onderkalaci 
 public  distributed_table_102015  table  onderkalaci 
 public  distributed_table_102022  table  onderkalaci 
 public  distributed_table_102029  table  onderkalaci 
 public  distributed_table_102036  table  onderkalaci 
└────────┴──────────────────────────┴───────┴─────────────┘
(7 rows)

Object propagation

Starting from Citus 9.0, Citus automatically manages some of the database objects. In other words, users only need to take care of creating the objects on the coordinator, and the rest is handled by Citus. The objects are automatically created on the worker nodes.

In Citus 11.0-beta release, we expanded the supported objects by Citus. Before getting into the details of the new object support, let us discuss citus.create_object_propagation GUC to control DDL creation behaviour in transactions.

Configurable modes of object propagattion

In a distributed database we often need to make trade-offs between consistency, fault tolerance, parallelism, and other distributed systems aspects. Citus needs to support PostgreSQL’s interactive, multi-statement transaction blocks, which is particularly challenging in a distributed setting.

For example, Citus typically parallelizes expensive operations across shards—such as analytical queries and create_distributed_table() over multiple connections per worker. When creating a database object, Citus propagates it to worker nodes over a single connection per worker. Combining those two operations in a single multi-statement transaction may cause issues because the parallel connections will not be able to see the object that was created over a single connection but not yet committed.

Consider a transaction block that creates a type, a table, loads data, and distributes the table:

BEGIN;
-- type creation over a single connection:
CREATE TYPE coordinates AS (x int, y int);
CREATE TABLE positions (object_id text primary key, position coordinates);
-- data loading thus goes over a single connection:
SELECT create_distributed_table(positions, object_id); \COPY positions FROM positions.csv 

Previously, Citus would defer creating the type on the worker nodes and commit it separately when doing the create_distributed_table. This enabled the data copying in create_distributed_table to happen in parallel. However, it also meant that the type was not always present on the Citus worker nodes, or if the transaction rolled back, it would be present only on the worker nodes. We were able to hide these inconsistencies, but eventually they could cause issues.

With Citus 11.0 beta the default behaviour changes to prioritize schema consistency between coordinator and worker nodes. That does have a downside: If object propagation happens after a parallel command, it can no longer be completed, as highlighted by the ERROR in the code block below:

BEGIN;
 CREATE TABLE items (key text, value text);

 -- parallel data loading:
 SELECT create_distributed_table(items, key);
 \COPY items FROM items.csv;

 CREATE TYPE coordinates AS (x int, y int);
 ERROR: cannot run type command because there was a parallel operation on a distributed table in the transaction
 DETAIL:  When running command on/for a distributed type, Citus needs to perform all operations over a single connection per node to ensure consistency.
HINT:  Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"

If you run into this issue, there are 2 simple workarounds: * Use set citus.create_object_propagation to deferred; to return to the old object propagation behavior * Use set citus.multi_shard_modify_mode to sequential to disable per-node parallelism. Data load in the same transaction might be slower

This change is introduced via Pull request #5724 and Pull request #5774.

New objects types managed by Citus

Several database objects, such as CREATE TYPE or CREATE EXTENSION has already been managed by Citus. In this release, we add several more objects that are automatically managed by Citus, so that you don’t have to take any manual steps to manage the objects across the cluster.

Tighten the object propagation

Prior to Citus 11.0-beta, Citus allowed objects in a way that could cause problems when the metadata syncing is enabled (a.k.a., Citus 11.0-beta). In other words, when metadata syncing is enabled, we cannot afford to have any dependencies that Citus cannot sync to the nodes. With Citus 11.0-beta, the following two cases are prevented:

Although not very commonly used, Postgres allows to create circular dependencies between objects. Syncing such objects to remote nodes are currently unsupported.

-- Show that causing circular dependency via functions and default values are not allowed
CREATE TABLE table_1_for_circ_dep(id int);
SELECT create_distributed_table('table_1_for_circ_dep','id');

CREATE OR REPLACE FUNCTION func_1_for_circ_dep(col_1 table_1_for_circ_dep)
RETURNS int LANGUAGE plpgsql AS $$ BEGIN return 1; END; $$;

CREATE TABLE table_2_for_circ_dep(id int, col_2 int default func_1_for_circ_dep(NULL::table_1_for_circ_dep));
SELECT create_distributed_table('table_2_for_circ_dep','id');

CREATE OR REPLACE FUNCTION func_2_for_circ_dep(col_3 table_2_for_circ_dep)
RETURNS int LANGUAGE plpgsql AS $$ BEGIN return 1; END; $$;

-- It should error out due to circular dependency
ALTER TABLE table_1_for_circ_dep ADD COLUMN col_2 int default func_2_for_circ_dep(NULL::table_2_for_circ_dep);
ERROR:  Citus can not handle circular dependencies between distributed objects
DETAIL:  "table table_1_for_circ_dep" circularly depends itself, resolve circular dependency first

Even if an object cannot be distributed by Citus, it can still reside on the coordinator as a local object. However, as soon as any such object is used as a dependency of a distributed object, Citus throws an error.

-- see https://github.com/citusdata/citus/blob/76f249a05a3b2d5a77fd79e23fd62fa85bb49b2c/src/test/regress/expected/pg13.out#L143-L167
-- to reproduce this example. This is only a simplified version. The main point to keep in mind is that
-- my_table cannot be distributed

-- myvarchar is a shell type that Citus does not support
CREATE TYPE myvarchar;
....

-- create a table that depends on unsupported object (e.g., myvarchar)
CREATE TABLE my_table (a int, b myvarchar);

SELECT create_distributed_table('my_table', 'a');
ERROR:  Relation "my_table" has dependency on unsupported object "type myvarchar”

Citus managed local tables

With the release of Citus 10.0, we enabled foreign keys between the reference tables and local Postgres tables. Also, we enabled joins between local and distributed tables.

Under the hood, Citus can convert local Postgres tables on the coordinator to “citus managed local tables”. This is nothing but converting the Postgres table to a single shard Citus table sitting on the coordinator, and record the table on the Citus metadata. With Citus 11.0-beta, as the metadata is synced, “citus managed local tables” are also synced. Meaning that, those tables can be accessed / used in the queries via the worker nodes as well, although the data sits on the coordinator.

Prior to Citus 11.0-beta, this feature was automatically handled by Citus when a foreign key is formed between a reference table and local Postgres tables. With Citus 11.0-beta, as this concept is also useful for accessing these tables on the worker nodes, we added some explicit ways of enabling the feature.

To do that, the pre-requisite is to add the coordinator to the metadata.

-- add the coordinator to the metadata
-- make sure to give the correct hostname / port of your enviroment
SELECT citus_set_coordinator_host('10.0.0.1', 5432);

Later, Citus 11.0-beta offers two seperate ways to convert local tabes to citus managed tables.

First, all new local tables are automatically added to the metadata with Pull request #5440:

-- change use_citus_managed_tables permenantly
ALTER SYSTEM SET citus.use_citus_managed_tables TO true;
SELECT pg_reload_conf();

-- create a local table, and see that it is on the Citus metadata
CREATE TABLE local(a int);
SELECT * FROM pg_dist_partition WHERE logicalrelid = 'local'::regclass;
┌──────────────┬────────────┬─────────┬──────────────┬──────────┬───────────────┐
 logicalrelid  partmethod  partkey  colocationid  repmodel  autoconverted 
├──────────────┼────────────┼─────────┼──────────────┼──────────┼───────────────┤
 local         n                               0  s         f             
└──────────────┴────────────┴─────────┴──────────────┴──────────┴───────────────┘
(1 row)

Second, you can selectively decide which local tables to become Citus managed tables via citus_add_local_table_to_metadata:

-- create a local table, and see that it is on the Citus metadata
CREATE TABLE local(a int);

select citus_add_local_table_to_metadata('local');
┌───────────────────────────────────┐
 citus_add_local_table_to_metadata 
├───────────────────────────────────┤
                                   
└───────────────────────────────────┘
(1 row)

SELECT * FROM pg_dist_partition WHERE logicalrelid = 'local'::regclass;
┌──────────────┬────────────┬─────────┬──────────────┬──────────┬───────────────┐
 logicalrelid  partmethod  partkey  colocationid  repmodel  autoconverted 
├──────────────┼────────────┼─────────┼──────────────┼──────────┼───────────────┤
 local         n                               0  s         f             
└──────────────┴────────────┴─────────┴──────────────┴──────────┴───────────────┘
(1 row)

In any case, if you connect to the worker nodes, you can still access to the local tables.

# 9700 is a worker node port
 psql postgres -p 9700 -c "SELECT count(*) FROM local"

┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

In general, our design policy for “citus managed local tables” is that they become indistinguishable from Postgres tables. In this release, we added support for partitioning. We track all the remaining features on this issue.

Goodbye to seldom-used feature

Like PostgreSQL, Citus maintains long-term backwards compatibility. We go through great lengths to make sure your applications continue working when upgrading Citus. However, sometimes a feature is no longer in line with how Citus is used and gets in the way of development. There are a few Citus features which we decided to remove in 11.0 beta:

  • Drops support for PostgreSQL 12: Our soft policy for Postgres version compatibilty is to support Citus’ latest release with Postgres 2 latest releases. In other words, Citus 11.0-beta supports Postgres 13 and Postgres 14.

  • Invalid shard placements: Shards are no longer marked invalid when a write fails, since this behavior had some flaws and leads limitations when metadata synced. This work involved several changes:

  • Append-distributed table functions: The original method of distribution in Citus was “append” distribution, which was optimized for append-only data. Hash-distributed tables are a lot easier to use and have many more capabilities and can also handle append-only data very well through partitioning. Citus 11.0 beta removes the functions for creating shards and loading new data into append-distributed tables. We are not aware of any append-distributed table users, but just in case: you can still upgrade to 11.0 beta, but the tables will become read only. We recommend creating new distributed tables that use the default hash-distribution.

  • Distributed cstore_fdw tables (should switch to columnar access method) Since version 10.0, Citus comes with built-in columnar storage. Prior to Citus 10.0, it was possible to use Citus with columnar storage using the now deprecated cstore_fdw extension.

However, cstore_fdw did not support important PostgreSQL features like streaming replication and backups, hence we rarely saw Citus customers use columnar storage before Citus 10. Many companies are now successfully using Citus’ built-in columnar storage for time series data, and we are therefore dropping support for creating or using distributed cstore_fdw tables. If you have distributed cstore_fdw tables, we recommend converting them to the columnar access method before upgrading to 11.0 beta.

This change is introduced via #5465

  • Drop support for distributed and reference foreign tables: Related to the dropping cstorefdw support. cstorefdw has been designed in a way that each shard is a foreign table. However, for many (if not all) external foreign table type – such as postgres_fdw –, this is very hard relationship to maintain. For example, how should the cluster be adjusted after a rebalance? This needs user’s involvement.

In order to simplify the user experience, we dropped support for foreign distributed or reference tables via Pull request #5605. Instead, we enabled “citus managed local tables” to become foreign tables that can be accessed from any node via Pull request #5461. Previously, TRUNCATE command was also not supported. As of Citus 11.0-beta, Citus supports [TRUNCATE] on foreign tables via Pull request #5710.

Note that you can still upgrade the cluster with existing distributed or reference foreign tables, but you cannot create new such tables.

  • Deprecations of less commonly used features
    • Pull request #5851: Drops citus.enable_cte_inlining GUC, which is only useful for Postgres 11
    • Pull request #5525: Removes support for dropping distributed and local indexes in the same statement, which doesn’t work well on metadata synced clusters
    • Pull request #5678: Replaces citus.enable_object_propagation GUC with citus.enable_metadata_sync, mostly a cosmetic change to have a broader name
    • Pull request #5609: Requires superuser for citus_add_node() and citus_activate_node() UDFs, you could have given certain privileges for non-superusers to add/activate nodes. For simplicity, always require superuser
    • Pull request #5753: Disallows unsupported lateral subqueries on distributed tables, we realized that when reference tables are involved in certain complex lateral joins, Citus might return wrong results. Until we properly support such queries, we prevent users getting wrong results.

Notable fixes

  • Pull request #5400: A major improvement for repartition joins, fixing several issues
    • Enable re-partition joins on followers
    • Enable re-partition joins in transaction blocks
    • Fixes a bug that could cause failed re-partition joins to leak result tables
    • Fixes a bug that could cause re-partition joins involving local shards to fail
    • Automatically use binary encoding in re-partition joins when possible
  • Pull request #5654: Fixes a bug that causes columnar storage pages to have zero LSN