v11.0 (Jun 2022)


What’s new in Citus 11.0?

Welcome to the 11.0 release of the Citus extension to PostgreSQL. This page dives deep into many of the changes in Citus 11.0, including these headline features:

Read the release blog post: If you want to read more about Citus 11 release, Marco’s blog gives a broader, big-picture perspective on some of the details covered here.

Citus 11 release party: Watch the recording of the Citus 11 Release Party, where the Citus engineering team talks about the highlights of Citus 11 (and shares some demos, too.) This was our first-ever, inaugural “release party” for a new Citus open source release, featuring demos of 2 of the newly-open-sourced Citus Enterprise features, as well as the new “query from any node” capability—plus new cluster activity views. Feedback welcome, we’d love to host more Citus Release Parties in the future and want to know what bits you found useful!

Citus Con talk & demo: If you prefer watching videos over reading, this talk from Citus Con titled—Citus 11: A look at the Elicorn’s Horn—is the recommendation for you, and includes a demo!

Citus is now fully open source!

Our team could not be more excited to open source the small handful of remaining Citus Enterprise features. Now the Citus extension to Postgres is fully open source, including:

That means that any user can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards and many more. In the following sections, we are going to talk about each item in detail.

These changes are introduced in Pull Request #6008.

Non-blocking shard rebalancer

Citus had already open-sourced the shard rebalancer. With this release, we are also open-sourcing non-blocking version.

It means that on Citus 11, Citus moves shards around by using logical replication to copy shards as well as all the writes to the shards that happen during the data copy. That way, your application will only experience a brief blip in write latencies when scaling out the cluster by moving existing data to new nodes. A prerequisite is that all your Postgres tables have primary keys.

The architecture and how to use is further explained on Citus documentation.

Multi-user support

As of Citus 11.0, several user management operations are propagated to the nodes in a Citus cluster via the coordinator. In this section, we’ll describe the features in a little more detail.

Propagation of CREATE/DROP/ALTER ROLE statements

Managing roles (and users) are automatically handled via the coordinator. In other words, as soon as a ROLE/USER is created on the coordinator, it is available on the workers. Connect to the coordinator and create a role:

CREATE ROLE create_role_test CREATEDB CREATEROLE LOGIN REPLICATION
                             CONNECTION LIMIT 105 PASSWORD 'type_your_passwd'
                             VALID UNTIL '2045-05-05 00:00:00.00+00';

Connect to any of the worker nodes, and see that the role is propagated. Note that the new nodes added via citus_add_node() would also get the roles roles/users created on the coordinator.

 \du create_role_test
                                List of roles
┌──────────────────┬─────────────────────────────────────────────┬───────────┐
    Role name                      Attributes                   Member of 
├──────────────────┼─────────────────────────────────────────────┼───────────┤
 create_role_test  Create role, Create DB, Replication        ↵│ {}        
                   105 connections                            ↵│           
                   Password valid until 2045-05-05 02:00:00+02            
└──────────────────┴─────────────────────────────────────────────┴───────────┘

Similarly, ALTER ROLE and DROP ROLE commands are propagated to the worker nodes.

Propagation of GRANT/REVOKE statements

Managing GRANT/REVOKE are automatically handled via the coordinator. In other words, the GRANT/REVOKE statements are reflected on the workers. For example, if you GRANT/REVOKE access on a distributed table, all the shards are also impacted by the statement.

Here, we create a role and a distributed table. The role has only SELECT access on the table.

CREATE ROLE read_access LOGIN;

CREATE TABLE test(key int);
SELECT create_distributed_table('test', 'key');

-- only allow SELECT on test for read_access
REVOKE ALL ON TABLE test FROM read_access;
GRANT SELECT ON TABLE test TO read_access;

SET ROLE read_access;

-- this user does not have INSERT grant, so fails
INSERT INTO test VALUES (1);
ERROR:  permission denied for table test

-- this user has SELECT grant, so works fine
SELECT count(*) FROM test;
┌───────┐
 count 
├───────┤
     0 
└───────┘
(1 row)

Row-level security on distributed tables

Postgres provides row security policies that restrict, on a per-user basis, which rows can be returned by normal queries or inserted, updated, or deleted by data modification commands. This feature is also known as Row-Level Security.

With Citus 11.0, this feature is available for distributed tables as well.

CREATE TABLE events (
    tenant_id int,
    id int,
    type text
);

SELECT create_distributed_table('events','tenant_id');

INSERT INTO events VALUES (1,1,'push');
INSERT INTO events VALUES (2,2,'push');

CREATE USER rls_tenant_1;
GRANT SELECT ON TABLE events TO rls_tenant_1;

-- Enable row level security
ALTER TABLE events ENABLE ROW LEVEL SECURITY;

-- Create policy for tenants to read access their own rows
CREATE POLICY user_mod ON events
  FOR SELECT TO rls_tenant_1
  USING (current_user = 'rls_tenant_' || tenant_id::text);

-- all rows should be visible because we are querying with
-- the table owner user now
SELECT * FROM events ORDER BY 1;
┌───────────┬────┬──────┐
 tenant_id  id  type 
├───────────┼────┼──────┤
         1   1  push 
         2   2  push 
└───────────┴────┴──────┘
(2 rows)

-- Switch user that has been granted rights,
-- should be able to see rows that the policy allows
SET ROLE rls_tenant_1;
SELECT * FROM events ORDER BY 1;
┌───────────┬────┬──────┐
 tenant_id  id  type 
├───────────┼────┼──────┤
         1   1  push 
└───────────┴────┴──────┘
(1 row)

Propagation of ALTER DATABASE … OWNER TO

Adds support for changing the database owner. It can be turned off by the user via setting the citus.enable_alter_database_owner.

Support for `sslkey` and `sslcert` in citus.node_conninfo

As a distributed database, many connections happens between the nodes in a Citus cluster. citus.node_conninfo can be used to populate authentication-related fields for these inter-node connections.

The documentation of citus.node_conninfo has all the details on how to configure the authentication between the nodes.

With Citus 11.0, citus.node_conninfo allows users to authentication the communication between nodes by adding sslkey and sslcert. A simple approach could be to use global certificate in citus.node_conninfo with trust clientcert=verify-ca in pg_hba.conf for internal connections, as it avoids having to change configs after adding a user, or storing passwords.

pg_dist_authinfo

As a distributed database, many connections happens between the nodes in a Citus cluster. pg_dist_authinfo can be used to populate authentication-related fields for these inter-node connections for per-user granularity.

The documentation of pg_dist_authinfo has all the details on how to configure the authentication between the nodes.

In general, it is simpler to set up authentication between nodes via citus.node_conninfo. If you need to set up authentication between nodes for per-user granularity, then pg_dist_authinfo can be a better alternative.

pg_dist_poolinfo

Connection poolers between Citus nodes can be useful when the application demands very high throughputs. Although as of Postgres 14 this is required for much less workloads, it can still be useful to setup poolers in a Citus cluster.

The documentation of pg_dist_poolinfo has all the details on how to configure the pooling between the nodes.

Tenant isolation

For many SaaS products, a common database problem is having one customer that has so much data, it adversely impacts other customers on the shared machine.

Tenant isolation is a good way to solve this issue. Effectively it allows you to control which tenant or customer in particular you want to isolate on a completely new node. By separating a tenant, you get dedicated resources with more memory and cpu processing power.

You can read on how to use tenant isolation feature from this blog and from the docs.

Per tenant statistics: citus_stat_statements

citus_stat_statements is an extended version of pg_stat_statements, giving additional execution statistics per-tenant (a.k.a., distribution key filter). For example, you can find the busiest tenants with the following query.

SELECT partition_key as tenant_id,
       count(*) as tenant_unique_queries,
       sum(calls) as tenant_total_queries,
       sum(total_time) as total_query_time
FROM citus_stat_statements
WHERE partition_key is not null
GROUP BY tenant_id
ORDER BY tenant_total_queries DESC
LIMIT 5;

You can read the details of citus_stat_statements from this blog and from the related docs.

Note that citus_stat_statements requires pg_stat_statements extension to be already installed. It is often useful to join citus_stat_statements and pg_stat_statements to get useful information for multi-tenant SaaS apps. And, by default, citus_stat_statements is disabled. You can enable via the following setting: citus.stat_statements_track = 'all'

Optimization for COPY when loading JSON/JSONB to avoid double parsing

Parsing and validating large JSON objects consumes a lot of CPU time. However, it is not strictly necessary for the coordinator to parse the JSON/JSONB, it is only useful for validation and returning the line number of line that contains the malformed JSON/JSONB.

With this optimization, Citus treats the JSON/JSONB objects as text on the coordinator—and skips validating—for COPY commands. This especially helps improving COPY performance for large objects.

This optimiztion can be opted out via citus.skip_jsonb_validation_in_copy. The user experience is almost indistinguishable. When the optimization is enabled, if there is an error on the JSONB, the error is thrown from the worker. Else, the error is thrown from the coordinator.

CREATE TABLE test(key int, data jsonb);
SELECT create_distributed_table('test', 'key');

-- error thrown from the worker
\COPY test (key, data) FROM STDIN WITH CSV
1,{"r":255}
\.
ERROR:  invalid input syntax for type json
DETAIL:  Token "r" is invalid.


SET citus.skip_jsonb_validation_in_copy TO off;

-- error thrown from the coordinator, indistinguishable from the above
COPY test FROM STDIN WITH CSV;
1,{"r":255}
\.
ERROR:  invalid input syntax for type json
DETAIL:  Token "r" is invalid.

Cluster Command

Citus 11.0 supports Cluster command on distributed and reference tables.

Querying from any node

The easiest way to use Citus is to connect to the coordinator and use it for both schema changes and distributed queries—as most of you have been doing prior to 11.0 —but for very demanding applications, you now have the option to load balance distributed queries across the workers in (parts of) your application by using a different connection string and factoring a few limitations.

Load balancing distributed queries across nodes are enabled by automatic metadata syncing to the nodes. All the code changes that enabled automatic metadata syncing is tracked on #5304. Throughout this release notes, we’ll dive into different parts of the changes.

Citus 11.0 changes to a new mode of operation by default: All Citus database clusters always use metadata syncing. That means that with Citus 11.0, you can always run distributed Postgres queries, from any node.

-- create a table, distribute it on the coordinator and
-- ingest a 3 rows for demo purposes
CREATE TABLE distributed_table(key int, value text);
SELECT create_distributed_table('distributed_table' ,'key');

INSERT INTO distributed_table VALUES (1, '1'), (2, '2'), (3, '3');
INSERT 0 3

Now, connect to any node, and run queries:

SELECT count(*) FROM distributed_table;
┌───────┐
 count 
├───────┤
     3 
└───────┘
(1 row)

INSERT INTO distributed_table VALUES (4, '4'), (5, '5'), (6, '6');
INSERT 0 3

All commands (SELECT, INSERT, UPDATE/DELETE or TRUNCATE) can be executed from any node on any tables (distributed tables, reference tables or citus managed local tables). All DDL commands, like ALTER TABLE, CREATE INDEX or create_distributed_table, should still happen via the coordinator.

To enable running queries via any node, Citus propagates all its metadata (pg_dist_* tables) to the nodes in the database cluster. Some of the metadata tables were already propagated by earlier versions of Citus. Citus 11.0 enabled propagation of pg_dist_object and pg_dist_colocation.

Connection management for high query throughput

If you need to perform many queries per second on a distributed PostgreSQL database, it may be necessary to use all nodes in the cluster for running distributed queries. To do that, you need a relatively large number of connections.

When your application opens a connection to one of the Citus nodes, that connection spawns a Postgres process, which will need to make internal connections to other nodes to perform distributed queries. If you connect directly to the worker nodes for distributed queries, your client connections are in a sense competing with internal connections. To make sure that every Postgres process resulting from a client connection can also make internal connections to all other nodes, we added the citus.max_client_connections setting via #5698. This setting restricts the number of (external) client connections, while continuing to allow internal connections between Citus nodes. In addition to the usual installation instructions, we recommend adding the following settings to postgresql.conf on each Citus node (coordinator and all workers) to accommodate a large number of client connections:

# The maximum number of client + internal connections a node can handle
# The total number of client connections across all nodes should never exceed this number
max_connections = 6000

# The number of client connections an individual node can handle
# Should be no greater than: max_connections / node count including the coordinator
citus.max_client_connections = 500

With these settings, each node will accept up to 500 connections from your application, so if you have 10 worker nodes and 1 coordinator your application can make 5500 connections in total. You can increase this number further by using a connection pooler like pgbouncer on each node.

We also highly recommend adding the Citus coordinator to the metadata, such that worker nodes can connect to the coordinator as well. Some Citus features will only be available if the coordinator is in the metadata. We will likely make adding the coordinator required in the future.

Cluster insights in the metadata synced clusters

When there is a single node in the cluster for running distributed queries, the cluster insights can be extracted pretty much simiilar to single node Postgres. You can use pg_stat_activity or pid(process id) to gain an insight of the current activity. Similarly, you can use pg_cancel_backend / pg_terminate_backend to have a further control over the activities.

When you use multiple nodes for querying the Citus cluster, gaining insights could become more complicated. We introduced several tools to alleviate potential difficulties the users can have.

  • citus_check_cluster_node_health() UDF: Any node should be able to connect to any other node in the cluster. You can use citus_check_cluster_node_health() UDF to check if there are any connectivity issues.
 from_nodename | from_nodeport | to_nodename | to_nodeport | result
--------------------------------------------------------------------
 localhost     |          9700 | localhost   |        9700 | t
 localhost     |          9700 | localhost   |        9701 | t
 localhost     |          9701 | localhost   |        9700 | t
 localhost     |          9701 | localhost   |        9701 | t
(4 rows)
  • citus_backend_gpid() UDF: Any process that initiates a distributed query could create some internal connections. With Citus 11.0, we introduce global_pid (or gpid), which uniquely identifies a client session and all internal connections associated with the session via #5601 and #5755. The gpid starts with the node ID of the node that initiated the query (= the node that the client connected to).
select citus_backend_gpid();
┌────────────────────┐
 citus_backend_gpid 
├────────────────────┤
        10000059518 
└────────────────────┘
(1 row)

Note that the internal connections also gets the same unique global pid of the session that initiated the distributed query.

select key, citus_backend_gpid() FROM distributed_table;
┌─────┬────────────────────┐
 key  citus_backend_gpid 
├─────┼────────────────────┤
   1         10000059518 
   5         10000059518 
   4         10000059518 
   3         10000059518 
   6         10000059518 
   2         10000059518 
└─────┴────────────────────┘
(6 rows)
  • citus_stat_activity View: Shows the information from pg_stat_activity from all client sessions across all nodes, along with a global_pid (or gpid). It includes all the internal backends as well. This change is introduced via #5731.
SELECT global_pid, query, state from citus_stat_activity WHERE query ilike 'INSER%' and global_pid != citus_backend_gpid();
┌─────────────┬────────────────────────────────────────────────────────────────────────────────┬─────────────────────┐
 global_pid                                       query                                              state        
├─────────────┼────────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
 80000062731  INSERT INTO distributed_table VALUES (1, '1');                                  idle in transaction 
 80000062731  INSERT INTO public.distributed_table_102009 (key, value) VALUES (1, '1'::text)  idle in transaction 
└─────────────┴────────────────────────────────────────────────────────────────────────────────┴─────────────────────┘
(2 rows)
  • citus_dist_stat_activity View: Similar to the citus_stat_activity but excludes the internal backends. Most of the time, this is what you would probably need. This change is introduced via #5760.
SELECT global_pid, query, state from citus_dist_stat_activity WHERE query ilike 'INSER%' and global_pid != citus_backend_gpid();
┌─────────────┬────────────────────────────────────────────────┬─────────────────────┐
 global_pid                       query                              state        
├─────────────┼────────────────────────────────────────────────┼─────────────────────┤
 80000062731  INSERT INTO distributed_table VALUES (1, '1');  idle in transaction 
└─────────────┴────────────────────────────────────────────────┴─────────────────────┘
(1 row)
  • citus_lock_waits View: In single node Postgres, there is a useful query which shows blocking/blocked activities. At Citus, we had already provided citus_lock_waits which gives the same information, but considering all the activity across the cluster. With Citus 11.0, we improved this view to show any blocking activitiy and with gpids attached. This change is introduced via #5766.
SELECT * FROM citus_lock_waits;
┌─[ RECORD 1 ]──────────────────────────┬────────────────────────────────────────────────┐
 waiting_gpid                           10000062689                                    
 blocking_gpid                          80000062731                                    
 blocked_statement                      TRUNCATE distributed_table;                    
 current_statement_in_blocking_process  INSERT INTO distributed_table VALUES (1, '1'); 
 waiting_nodeid                         1                                              
 blocking_nodeid                        8                                              
└───────────────────────────────────────┴────────────────────────────────────────────────┘

Time: 76.163 ms
  • Overriden pg_cancel_backend() and pg_terminate_backend(): Postgres allows cancelling and/or terminating processes with a pid. On a distributed Citus cluster, we want to preserve the same ability with as little effort as possible. Given that we have global pids, we overriden these UDFs to accept gpids. For simplicty, below we show using the current backends gpid with the UDFs. In general, you can pass gpids from any node.
-- cancel own backend via gpid
SELECT pg_cancel_backend(citus_backend_gpid());
ERROR:  canceling statement due to user request

-- terminatee own backend via global_pid
SELECT pg_terminate_backend(citus_backend_gpid());
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

These changes are introduced in Pull Request #5699.

Controlling shard visibility

With Citus 11.0, users can connect to any node to query their distributed tables. We are getting reports from users who are understandbly confused by seeing Citus-generated shards in their schemas next to their distributed tables when inspecting the database with tools like pgAdmin.

To avoid the confusion, Citus 11.0 by default hides all the shards from all external applications. This is controlled by a GUC citus.show_shards_for_app_name_prefixes.

This change is introduced via Pull request #5567 and Pull request #5920.

 -- empty string means that show shards to all applications
 show citus.show_shards_for_app_name_prefixes;
┌──────────────────────────────────────────┐
 citus.show_shards_for_app_name_prefixes  
├──────────────────────────────────────────┤
                                          
└──────────────────────────────────────────┘
(1 row)

 \d
                 List of relations
┌────────┬───────────────────┬───────┬─────────────┐
 Schema        Name         Type      Owner    
├────────┼───────────────────┼───────┼─────────────┤
 public  citus_tables       view   onderkalaci 
 public  distributed_table  table  onderkalaci 
└────────┴───────────────────┴───────┴─────────────┘
(2 rows)

-- now configure the GUC so that it only hides shards from a specific application
-- and then, on psql -- where we run the test -- we can see the shards

SET citus.show_shards_for_app_name_prefixes TO 'psql';
\d
                     List of relations
┌────────┬──────────────────────────┬───────┬─────────────┐
 Schema            Name            Type      Owner    
├────────┼──────────────────────────┼───────┼─────────────┤
 public  citus_tables              view   onderkalaci 
 public  distributed_table         table  onderkalaci 
 public  distributed_table_102008  table  onderkalaci 
 public  distributed_table_102015  table  onderkalaci 
 public  distributed_table_102022  table  onderkalaci 
 public  distributed_table_102029  table  onderkalaci 
 public  distributed_table_102036  table  onderkalaci 
└────────┴──────────────────────────┴───────┴─────────────┘
(7 rows)

Note that, if you want to make all the shards visible, you can do that via citus.override_table_visibility:

-- do not hide any tables (a.k.a., shards)
SET citus.override_table_visibility TO off;
\d
                     List of relations
┌────────┬──────────────────────────┬───────┬─────────────┐
 Schema            Name            Type      Owner    
├────────┼──────────────────────────┼───────┼─────────────┤
 public  citus_tables              view   onderkalaci 
 public  distributed_table         table  onderkalaci 
 public  distributed_table_102008  table  onderkalaci 
 public  distributed_table_102015  table  onderkalaci 
 public  distributed_table_102022  table  onderkalaci 
 public  distributed_table_102029  table  onderkalaci 
 public  distributed_table_102036  table  onderkalaci 
└────────┴──────────────────────────┴───────┴─────────────┘
(7 rows)

New helper functions

If you are familiar with Citus’ run_command_on_workers(), you know how handy it is to use. With querying from any node functionality, we added a few more helper functions that can be useful in a similar manner.

In general, if you want these functions to work as shown here, make sure that the coordinator is already added to the metadata, see citus_set_coordinator_host().

The new helper functions added in Citus 11.0 are run_command_on_coordinator() via Pull request #5941, run_command_on_all_nodes() via Pull request #5731 and citus_is_coordinator() via Pull request #5888

Connect one of the nodes in the cluster, and experiment with the helper functions. In this example, we’ll be connecting to one of the worker nodes:

-- make sure that the coordinator is added to the metadata
SELECT nodename, nodeport FROM pg_dist_node WHERE groupid = 0;
┌───────────┬──────────┐
 nodename   nodeport 
├───────────┼──────────┤
 localhost      5432 
└───────────┴──────────┘
(1 row)

-- show that we are on a worker node
SELECT citus_is_coordinator();
┌──────────────────────┐
 citus_is_coordinator 
├──────────────────────┤
 f                    
└──────────────────────┘
(1 row)

-- for example, see work_mem setting on all nodes
SELECT
  nodename, nodeport, result
FROM run_command_on_all_nodes($$SHOW work_mem$$) JOIN pg_dist_node USING(nodeid);
┌───────────┬──────────┬────────┐
 nodename   nodeport  result 
├───────────┼──────────┼────────┤
 localhost      5432  4MB    
 localhost      9700  4MB    
 localhost      9701  4MB    
 localhost      9702  4MB    
 localhost      9703  4MB    
 localhost      9704  4MB    
 localhost      9705  4MB    
 localhost      9706  4MB    
└───────────┴──────────┴────────┘
(8 rows)


-- or, you can checek if there are any prepared transactions stored
-- on the coordinator
SELECT
  result
FROM run_command_on_coordinator($$SELECT count(*) FROM pg_prepared_xacts$$);
┌────────┐
 result 
├────────┤
 0      
└────────┘
(1 row)

Upgrading an existing Citus database cluster to Citus 11.0

If you are upgrading an existing cluster to Citus 11.0, then after installing the new packages, you will need to call a function to finalize the upgrade. This has been implemented on #5556 and #5995.

-- execute on all nodes
ALTER EXTENSION citus UPDATE;

And, then only on the coordinator:

-- only on the coordinator
CALL citus_finish_citus_upgrade();

The upgrade function will make sure that all worker nodes have the right schema and metadata. It also addresses several naming issues that affected shards of partitioned tables.

If there is anything preventing metadata from being synchronized (e.g. missing permissions on worker nodes, or conflicting objects) the upgrade function will throw an error. You can still use your existing Citus database cluster via the coordinator until you address the issue and finalize the upgrade, but some new 11.0 functionality will not be available.

Object propagation

Starting from Citus 9.0, Citus automatically manages some of the database objects. In other words, users only need to take care of creating the objects on the coordinator, and the rest is handled by Citus. The objects are automatically created on the worker nodes.

In Citus 11.0 release, we expanded the supported objects by Citus. Before getting into the details of the new object support, let us discuss citus.create_object_propagation GUC to control DDL creation behaviour in transactions.

Configurable modes of object propagattion

In a distributed database we often need to make trade-offs between consistency, fault tolerance, parallelism, and other distributed systems aspects. Citus needs to support PostgreSQL’s interactive, multi-statement transaction blocks, which is particularly challenging in a distributed setting.

For example, Citus typically parallelizes expensive operations across shards—such as analytical queries and create_distributed_table() over multiple connections per worker. When creating a database object, Citus propagates it to worker nodes over a single connection per worker. Combining those two operations in a single multi-statement transaction may cause issues because the parallel connections will not be able to see the object that was created over a single connection but not yet committed.

Consider a transaction block that creates a type, a table, loads data, and distributes the table:

BEGIN;
-- type creation over a single connection:
CREATE TYPE coordinates AS (x int, y int);
CREATE TABLE positions (object_id text primary key, position coordinates);
-- data loading thus goes over a single connection:
SELECT create_distributed_table(positions, object_id); \COPY positions FROM positions.csv 

Previously, Citus would defer creating the type on the worker nodes and commit it separately when doing the create_distributed_table. This enabled the data copying in create_distributed_table to happen in parallel. However, it also meant that the type was not always present on the Citus worker nodes, or if the transaction rolled back, it would be present only on the worker nodes. We were able to hide these inconsistencies, but eventually they could cause issues.

With Citus 11.0 the default behaviour changes to prioritize schema consistency between coordinator and worker nodes. That does have a downside: If object propagation happens after a parallel command, it can no longer be completed, as highlighted by the ERROR in the code block below:

BEGIN;
 CREATE TABLE items (key text, value text);

 -- parallel data loading:
 SELECT create_distributed_table(items, key);
 \COPY items FROM items.csv;

 CREATE TYPE coordinates AS (x int, y int);
 ERROR: cannot run type command because there was a parallel operation on a distributed table in the transaction
 DETAIL:  When running command on/for a distributed type, Citus needs to perform all operations over a single connection per node to ensure consistency.
HINT:  Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"

If you run into this issue, there are 2 simple workarounds: * Use set citus.create_object_propagation to deferred; to return to the old object propagation behavior * Use set citus.multi_shard_modify_mode to sequential to disable per-node parallelism. Data load in the same transaction might be slower

This change is introduced via Pull request #5724 and Pull request #5774.

New objects types managed by Citus

Several database objects, such as CREATE TYPE or CREATE EXTENSION has already been managed by Citus. In this release, we add several more objects that are automatically managed by Citus, so that you don’t have to take any manual steps to manage the objects across the cluster.

Tighten the object propagation

Prior to Citus 11.0, Citus allowed objects in a way that could cause problems when the metadata syncing is enabled (a.k.a., Citus 11.0). In other words, when metadata syncing is enabled, we cannot afford to have any dependencies that Citus cannot sync to the nodes. With Citus 11.0, the following two cases are prevented:

Although not very commonly used, Postgres allows to create circular dependencies between objects. Syncing such objects to remote nodes are currently unsupported.

-- Show that causing circular dependency via functions and default values are not allowed
CREATE TABLE table_1_for_circ_dep(id int);
SELECT create_distributed_table('table_1_for_circ_dep','id');

CREATE OR REPLACE FUNCTION func_1_for_circ_dep(col_1 table_1_for_circ_dep)
RETURNS int LANGUAGE plpgsql AS $$ BEGIN return 1; END; $$;

CREATE TABLE table_2_for_circ_dep(id int, col_2 int default func_1_for_circ_dep(NULL::table_1_for_circ_dep));
SELECT create_distributed_table('table_2_for_circ_dep','id');

CREATE OR REPLACE FUNCTION func_2_for_circ_dep(col_3 table_2_for_circ_dep)
RETURNS int LANGUAGE plpgsql AS $$ BEGIN return 1; END; $$;

-- It should error out due to circular dependency
ALTER TABLE table_1_for_circ_dep ADD COLUMN col_2 int default func_2_for_circ_dep(NULL::table_2_for_circ_dep);
ERROR:  Citus can not handle circular dependencies between distributed objects
DETAIL:  "table table_1_for_circ_dep" circularly depends itself, resolve circular dependency first

Even if an object cannot be distributed by Citus, it can still reside on the coordinator as a local object. However, as soon as any such object is used as a dependency of a distributed object, Citus throws an error.

-- see https://github.com/citusdata/citus/blob/76f249a05a3b2d5a77fd79e23fd62fa85bb49b2c/src/test/regress/expected/pg13.out#L143-L167
-- to reproduce this example. This is only a simplified version. The main point to keep in mind is that
-- my_table cannot be distributed

-- myvarchar is a shell type that Citus does not support
CREATE TYPE myvarchar;
....

-- create a table that depends on unsupported object (e.g., myvarchar)
CREATE TABLE my_table (a int, b myvarchar);

SELECT create_distributed_table('my_table', 'a');
ERROR:  Relation "my_table" has dependency on unsupported object "type myvarchar”

Citus managed local tables

With the release of Citus 10.0, we enabled foreign keys between the reference tables and local Postgres tables. Also, we enabled joins between local and distributed tables.

Under the hood, Citus can convert local Postgres tables on the coordinator to “citus managed local tables”. This is nothing but converting the Postgres table to a single shard Citus table sitting on the coordinator, and record the table on the Citus metadata. With Citus 11.0, as the metadata is synced, “citus managed local tables” are also synced. Meaning that, those tables can be accessed / used in the queries via the worker nodes as well, although the data sits on the coordinator.

Prior to Citus 11.0, this feature was automatically handled by Citus when a foreign key is formed between a reference table and local Postgres tables. With Citus 11.0, as this concept is also useful for accessing these tables on the worker nodes, we added some explicit ways of enabling the feature.

To do that, the pre-requisite is to add the coordinator to the metadata.

-- add the coordinator to the metadata
-- make sure to give the correct hostname / port of your enviroment
SELECT citus_set_coordinator_host('10.0.0.1', 5432);

Later, Citus 11.0 offers two seperate ways to convert local tabes to citus managed tables.

First, all new local tables are automatically added to the metadata with Pull request #5440:

-- change use_citus_managed_tables permenantly
ALTER SYSTEM SET citus.use_citus_managed_tables TO true;
SELECT pg_reload_conf();

-- create a local table, and see that it is on the Citus metadata
CREATE TABLE local(a int);
SELECT * FROM pg_dist_partition WHERE logicalrelid = 'local'::regclass;
┌──────────────┬────────────┬─────────┬──────────────┬──────────┬───────────────┐
 logicalrelid  partmethod  partkey  colocationid  repmodel  autoconverted 
├──────────────┼────────────┼─────────┼──────────────┼──────────┼───────────────┤
 local         n                               0  s         f             
└──────────────┴────────────┴─────────┴──────────────┴──────────┴───────────────┘
(1 row)

Second, you can selectively decide which local tables to become Citus managed tables via citus_add_local_table_to_metadata:

-- create a local table, and see that it is on the Citus metadata
CREATE TABLE local(a int);

select citus_add_local_table_to_metadata('local');
┌───────────────────────────────────┐
 citus_add_local_table_to_metadata 
├───────────────────────────────────┤
                                   
└───────────────────────────────────┘
(1 row)

SELECT * FROM pg_dist_partition WHERE logicalrelid = 'local'::regclass;
┌──────────────┬────────────┬─────────┬──────────────┬──────────┬───────────────┐
 logicalrelid  partmethod  partkey  colocationid  repmodel  autoconverted 
├──────────────┼────────────┼─────────┼──────────────┼──────────┼───────────────┤
 local         n                               0  s         f             
└──────────────┴────────────┴─────────┴──────────────┴──────────┴───────────────┘
(1 row)

In any case, if you connect to the worker nodes, you can still access to the local tables.

# 9700 is a worker node port
 psql postgres -p 9700 -c "SELECT count(*) FROM local"

┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

In general, our design policy for “citus managed local tables” is that they become indistinguishable from Postgres tables. In this release, we added support for partitioning. We track all the remaining features on this issue.

Goodbye to seldom-used feature

Like PostgreSQL, Citus maintains long-term backwards compatibility. We go through great lengths to make sure your applications continue working when upgrading Citus. However, sometimes a feature is no longer in line with how Citus is used and gets in the way of development. There are a few Citus features which we decided to remove in 11.0:

  • Drops support for PostgreSQL 12: Our soft policy for Postgres version compatibilty is to support Citus’ latest release with Postgres 2 latest releases. In other words, Citus 11.0 supports Postgres 13 and Postgres 14.

  • Invalid shard placements: Shards are no longer marked invalid when a write fails, since this behavior had some flaws and leads limitations when metadata synced. This work involved several changes:

  • Append-distributed table functions: The original method of distribution in Citus was “append” distribution, which was optimized for append-only data. Hash-distributed tables are a lot easier to use and have many more capabilities and can also handle append-only data very well through partitioning. Citus 11.0 removes the functions for creating shards and loading new data into append-distributed tables. We are not aware of any append-distributed table users, but just in case: you can still upgrade to 11.0, but the tables will become read only. We recommend creating new distributed tables that use the default hash-distribution.

  • Distributed cstore_fdw tables (should switch to columnar access method) Since version 10.0, Citus comes with built-in columnar storage. Prior to Citus 10.0, it was possible to use Citus with columnar storage using the now deprecated cstore_fdw extension.

However, cstore_fdw did not support important PostgreSQL features like streaming replication and backups, hence we rarely saw Citus customers use columnar storage before Citus 10. Many companies are now successfully using Citus’ built-in columnar storage for time series data, and we are therefore dropping support for creating or using distributed cstore_fdw tables. If you have distributed cstore_fdw tables, we recommend converting them to the columnar access method before upgrading to 11.0.

This change is introduced via #5465

  • Drop support for distributed and reference foreign tables: Related to the dropping cstorefdw support. cstorefdw has been designed in a way that each shard is a foreign table. However, for many (if not all) external foreign table type—such as postgres_fdw—this is very hard relationship to maintain. For example, how should the cluster be adjusted after a rebalance? This needs user’s involvement.

In order to simplify the user experience, we dropped support for foreign distributed or reference tables via Pull request #5605. Instead, we enabled “citus managed local tables” to become foreign tables that can be accessed from any node via Pull request #5461. Previously, TRUNCATE command was also not supported. As of Citus 11.0, Citus supports [TRUNCATE] on foreign tables via Pull request #5710.

Note that you can still upgrade the cluster with existing distributed or reference foreign tables, but you cannot create new such tables.

  • Move pg_dist_object to pg_catalog: Historically pg_dist_object had been created in the citus schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of citus.

By default postgres put the username on the search_path. To be able to read the catalog in the citus schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema.

With this change we move the pg_dist_object catalog table to the pg_catalog schema, where our other schema’s are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes.

This change introduced via Pull request #5765.

  • Deprecations of less commonly used features
    • Pull request #5851: Drops citus.enable_cte_inlining GUC, which is only useful for Postgres 11
    • Pull request #5525: Removes support for dropping distributed and local indexes in the same statement, which doesn’t work well on metadata synced clusters
    • Pull request #5678: Replaces citus.enable_object_propagation GUC with citus.enable_metadata_sync, mostly a cosmetic change to have a broader name
    • Pull request #5609: Requires superuser for citus_add_node() and citus_activate_node() UDFs, you could have given certain privileges for non-superusers to add/activate nodes. For simplicity, always require superuser
    • Pull request #5753: Disallows unsupported lateral subqueries on distributed tables, we realized that when reference tables are involved in certain complex lateral joins, Citus might return wrong results. Until we properly support such queries, we prevent users getting wrong results.

Notable fixes

  • Pull request #5400: A major improvement for repartition joins, fixing several issues
    • Enable re-partition joins on followers
    • Enable re-partition joins in transaction blocks
    • Fixes a bug that could cause failed re-partition joins to leak result tables
    • Fixes a bug that could cause re-partition joins involving local shards to fail
    • Automatically use binary encoding in re-partition joins when possible
  • Pull request #5654: Fixes a bug that causes columnar storage pages to have zero LSN
  • Pull request #5882: Fixes a bug that could cause invalid JSON when running EXPLAIN ANALYZE with subplans