POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Updates to this version:
Welcome to the 11.0 release of the Citus extension to PostgreSQL. This page dives deep into many of the changes in Citus 11.0, including these headline features:
Read the release blog post: If you want to read more about Citus 11 release, Marco's blog gives a broader, big-picture perspective on some of the details covered here.
Citus 11 release party: Watch the recording of the Citus 11 Release Party, where the Citus engineering team talks about the highlights of Citus 11 (and shares some demos, too.) This was our first-ever, inaugural “release party” for a new Citus open source release, featuring demos of 2 of the newly-open-sourced Citus Enterprise features, as well as the new “query from any node” capability—plus new cluster activity views. Feedback welcome, we’d love to host more Citus Release Parties in the future and want to know what bits you found useful!
Citus Con talk & demo: If you prefer watching videos over reading, this talk from Citus Con titled—Citus 11: A look at the Elicorn's Horn—is the recommendation for you, and includes a demo!
Our team could not be more excited to open source the small handful of remaining Citus Enterprise features. Now the Citus extension to Postgres is fully open source, including:
CREATE
/DROP
/ALTER
ROLE statementsGRANT
/REVOKE
statementsALTER DATABASE ... OWNER TO
JSON
/JSONB
objectsThat means that any user can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards and many more. In the following sections, we are going to talk about each item in detail.
These changes are introduced in Pull Request #6008.
Citus had already open-sourced the shard rebalancer. With this release, we are also open-sourcing non-blocking version.
It means that on Citus 11, Citus moves shards around by using logical replication to copy shards as well as all the writes to the shards that happen during the data copy. That way, your application will only experience a brief blip in write latencies when scaling out the cluster by moving existing data to new nodes. A prerequisite is that all your Postgres tables have primary keys.
The architecture and how to use is further explained on Citus documentation.
As of Citus 11.0, several user management operations are propagated to the nodes in a Citus cluster via the coordinator. In this section, we'll describe the features in a little more detail.
Managing roles (and users) are automatically handled via the coordinator. In other words, as soon as a ROLE/USER is created on the coordinator, it is available on the workers. Connect to the coordinator and create a role:
CREATE ROLE create_role_test CREATEDB CREATEROLE LOGIN REPLICATION
CONNECTION LIMIT 105 PASSWORD 'type_your_passwd'
VALID UNTIL '2045-05-05 00:00:00.00+00';
Connect to any of the worker nodes, and see that the role is propagated. Note that the new nodes added via citus_add_node()
would also get the roles roles/users created on the coordinator.
\du create_role_test
List of roles
┌──────────────────┬─────────────────────────────────────────────┬───────────┐
│ Role name │ Attributes │ Member of │
├──────────────────┼─────────────────────────────────────────────┼───────────┤
│ create_role_test │ Create role, Create DB, Replication ↵│ {} │
│ │ 105 connections ↵│ │
│ │ Password valid until 2045-05-05 02:00:00+02 │ │
└──────────────────┴─────────────────────────────────────────────┴───────────┘
Similarly, ALTER ROLE
and DROP ROLE
commands are propagated to the worker nodes.
Managing GRANT
/REVOKE
are automatically handled via the coordinator. In other words, the GRANT
/REVOKE
statements are reflected on the workers. For example, if you GRANT
/REVOKE
access on a distributed table, all the shards are also impacted by the statement.
Here, we create a role and a distributed table. The role has only SELECT
access on the table.
CREATE ROLE read_access LOGIN;
CREATE TABLE test(key int);
SELECT create_distributed_table('test', 'key');
-- only allow SELECT on test for read_access
REVOKE ALL ON TABLE test FROM read_access;
GRANT SELECT ON TABLE test TO read_access;
SET ROLE read_access;
-- this user does not have INSERT grant, so fails
INSERT INTO test VALUES (1);
ERROR: permission denied for table test
-- this user has SELECT grant, so works fine
SELECT count(*) FROM test;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)
Postgres provides row security policies that restrict, on a per-user basis, which rows can be returned by normal queries or inserted, updated, or deleted by data modification commands. This feature is also known as Row-Level Security.
With Citus 11.0, this feature is available for distributed tables as well.
CREATE TABLE events (
tenant_id int,
id int,
type text
);
SELECT create_distributed_table('events','tenant_id');
INSERT INTO events VALUES (1,1,'push');
INSERT INTO events VALUES (2,2,'push');
CREATE USER rls_tenant_1;
GRANT SELECT ON TABLE events TO rls_tenant_1;
-- Enable row level security
ALTER TABLE events ENABLE ROW LEVEL SECURITY;
-- Create policy for tenants to read access their own rows
CREATE POLICY user_mod ON events
FOR SELECT TO rls_tenant_1
USING (current_user = 'rls_tenant_' || tenant_id::text);
-- all rows should be visible because we are querying with
-- the table owner user now
SELECT * FROM events ORDER BY 1;
┌───────────┬────┬──────┐
│ tenant_id │ id │ type │
├───────────┼────┼──────┤
│ 1 │ 1 │ push │
│ 2 │ 2 │ push │
└───────────┴────┴──────┘
(2 rows)
-- Switch user that has been granted rights,
-- should be able to see rows that the policy allows
SET ROLE rls_tenant_1;
SELECT * FROM events ORDER BY 1;
┌───────────┬────┬──────┐
│ tenant_id │ id │ type │
├───────────┼────┼──────┤
│ 1 │ 1 │ push │
└───────────┴────┴──────┘
(1 row)
Adds support for changing the database owner. It can be turned off by the user via setting the citus.enable_alter_database_owner
.
As a distributed database, many connections happens between the nodes in a Citus cluster. citus.node_conninfo
can be used to populate authentication-related fields for these inter-node connections.
The documentation of citus.node_conninfo
has all the details on how to configure the authentication between the nodes.
With Citus 11.0, citus.node_conninfo
allows users to authentication the communication between nodes by adding sslkey
and sslcert
. A simple approach could be to use global certificate in citus.node_conninfo
with trust clientcert=verify-ca
in pg_hba.conf for internal connections, as it avoids having to change configs after adding a user, or storing passwords.
As a distributed database, many connections happens between the nodes in a Citus cluster. pg_dist_authinfo
can be used to populate authentication-related fields for these inter-node connections for per-user granularity.
The documentation of pg_dist_authinfo
has all the details on how to configure the authentication between the nodes.
In general, it is simpler to set up authentication between nodes via citus.node_conninfo
. If you need to set up authentication between nodes for per-user granularity, then pg_dist_authinfo
can be a better alternative.
Connection poolers between Citus nodes can be useful when the application demands very high throughputs. Although as of Postgres 14 this is required for much less workloads, it can still be useful to setup poolers in a Citus cluster.
The documentation of pg_dist_poolinfo
has all the details on how to configure the pooling between the nodes.
For many SaaS products, a common database problem is having one customer that has so much data, it adversely impacts other customers on the shared machine.
Tenant isolation is a good way to solve this issue. Effectively it allows you to control which tenant or customer in particular you want to isolate on a completely new node. By separating a tenant, you get dedicated resources with more memory and cpu processing power.
You can read on how to use tenant isolation feature from this blog and from the docs.
citus_stat_statements
is an extended version of pg_stat_statements
, giving additional execution statistics per-tenant (a.k.a., distribution key filter
). For example, you can find the busiest tenants with the following query.
SELECT partition_key as tenant_id,
count(*) as tenant_unique_queries,
sum(calls) as tenant_total_queries,
sum(total_time) as total_query_time
FROM citus_stat_statements
WHERE partition_key is not null
GROUP BY tenant_id
ORDER BY tenant_total_queries DESC
LIMIT 5;
You can read the details of citus_stat_statements
from this blog and from the related docs.
Note that citus_stat_statements
requires pg_stat_statements
extension to be already installed. It is often useful to join citus_stat_statements
and pg_stat_statements
to get useful information for multi-tenant SaaS apps. And, by default, citus_stat_statements
is disabled. You can enable via the following setting: citus.stat_statements_track = 'all'
Parsing and validating large JSON objects consumes a lot of CPU time. However, it is not strictly necessary for the coordinator to parse the JSON
/JSONB
, it is only useful for validation and returning the line number of line that contains the malformed JSON
/JSONB
.
With this optimization, Citus treats the JSON
/JSONB
objects as text on the coordinator—and skips validating—for COPY
commands. This especially helps improving COPY
performance for large objects.
This optimiztion can be opted out via citus.skip_jsonb_validation_in_copy
. The user experience is almost indistinguishable. When the optimization is enabled, if there is an error on the JSONB
, the error is thrown from the worker. Else, the error is thrown from the coordinator.
CREATE TABLE test(key int, data jsonb);
SELECT create_distributed_table('test', 'key');
-- error thrown from the worker
\COPY test (key, data) FROM STDIN WITH CSV
1,{"r":255}
\.
ERROR: invalid input syntax for type json
DETAIL: Token "r" is invalid.
SET citus.skip_jsonb_validation_in_copy TO off;
-- error thrown from the coordinator, indistinguishable from the above
COPY test FROM STDIN WITH CSV;
1,{"r":255}
\.
ERROR: invalid input syntax for type json
DETAIL: Token "r" is invalid.
Citus 11.0 supports Cluster command on distributed and reference tables.
The easiest way to use Citus is to connect to the coordinator and use it for both schema changes and distributed queries—as most of you have been doing prior to 11.0 —but for very demanding applications, you now have the option to load balance distributed queries across the workers in (parts of) your application by using a different connection string and factoring a few limitations.
Load balancing distributed queries across nodes are enabled by automatic metadata syncing to the nodes. All the code changes that enabled automatic metadata syncing is tracked on #5304. Throughout this release notes, we'll dive into different parts of the changes.
Citus 11.0 changes to a new mode of operation by default: All Citus database clusters always use metadata syncing. That means that with Citus 11.0, you can always run distributed Postgres queries, from any node.
-- create a table, distribute it on the coordinator and
-- ingest a 3 rows for demo purposes
CREATE TABLE distributed_table(key int, value text);
SELECT create_distributed_table('distributed_table' ,'key');
INSERT INTO distributed_table VALUES (1, '1'), (2, '2'), (3, '3');
INSERT 0 3
Now, connect to any node, and run queries:
SELECT count(*) FROM distributed_table;
┌───────┐
│ count │
├───────┤
│ 3 │
└───────┘
(1 row)
INSERT INTO distributed_table VALUES (4, '4'), (5, '5'), (6, '6');
INSERT 0 3
All commands (SELECT
, INSERT
, UPDATE/DELETE
or TRUNCATE
) can be executed from any node on any tables (distributed tables
, reference tables
or citus managed local tables
). All DDL commands, like ALTER TABLE
, CREATE INDEX
or create_distributed_table
, should still happen via the coordinator.
To enable running queries via any node, Citus propagates all its metadata (pg_dist_*
tables) to the nodes in the database cluster. Some of the metadata tables were already propagated by earlier versions of Citus. Citus 11.0 enabled propagation of pg_dist_object
and pg_dist_colocation
.
If you need to perform many queries per second on a distributed PostgreSQL database, it may be necessary to use all nodes in the cluster for running distributed queries. To do that, you need a relatively large number of connections.
When your application opens a connection to one of the Citus nodes, that connection spawns a Postgres process, which will need to make internal connections to other nodes to perform distributed queries. If you connect directly to the worker nodes for distributed queries, your client connections are in a sense competing with internal connections. To make sure that every Postgres process resulting from a client connection can also make internal connections to all other nodes, we added the citus.max_client_connections
setting via #5698. This setting restricts the number of (external) client connections, while continuing to allow internal connections between Citus nodes. In addition to the usual installation instructions, we recommend adding the following settings to postgresql.conf
on each Citus node (coordinator and all workers) to accommodate a large number of client connections:
# The maximum number of client + internal connections a node can handle
# The total number of client connections across all nodes should never exceed this number
max_connections = 6000
# The number of client connections an individual node can handle
# Should be no greater than: max_connections / node count including the coordinator
citus.max_client_connections = 500
With these settings, each node will accept up to 500 connections from your application, so if you have 10 worker nodes and 1 coordinator your application can make 5500 connections in total. You can increase this number further by using a connection pooler like pgbouncer on each node.
We also highly recommend adding the Citus coordinator to the metadata, such that worker nodes can connect to the coordinator as well. Some Citus features will only be available if the coordinator is in the metadata. We will likely make adding the coordinator required in the future.
When there is a single node in the cluster for running distributed queries, the cluster insights can be extracted pretty much simiilar to single node Postgres. You can use pg_stat_activity
or pid(process id)
to gain an insight of the current activity. Similarly, you can use pg_cancel_backend
/ pg_terminate_backend
to have a further control over the activities.
When you use multiple nodes for querying the Citus cluster, gaining insights could become more complicated. We introduced several tools to alleviate potential difficulties the users can have.
citus_check_cluster_node_health()
UDF: Any node should be able to connect to any other node in the cluster. You can use citus_check_cluster_node_health()
UDF to check if there are any connectivity issues. from_nodename | from_nodeport | to_nodename | to_nodeport | result
--------------------------------------------------------------------
localhost | 9700 | localhost | 9700 | t
localhost | 9700 | localhost | 9701 | t
localhost | 9701 | localhost | 9700 | t
localhost | 9701 | localhost | 9701 | t
(4 rows)
citus_backend_gpid()
UDF: Any process that initiates a distributed query could create some internal connections. With Citus 11.0, we introduce global_pid
(or gpid
), which uniquely identifies a client session and all internal connections associated with the session via #5601 and #5755. The gpid
starts with the node ID of the node that initiated the query (= the node that the client connected to).select citus_backend_gpid();
┌────────────────────┐
│ citus_backend_gpid │
├────────────────────┤
│ 10000059518 │
└────────────────────┘
(1 row)
Note that the internal connections also gets the same unique global pid of the session that initiated the distributed query.
select key, citus_backend_gpid() FROM distributed_table;
┌─────┬────────────────────┐
│ key │ citus_backend_gpid │
├─────┼────────────────────┤
│ 1 │ 10000059518 │
│ 5 │ 10000059518 │
│ 4 │ 10000059518 │
│ 3 │ 10000059518 │
│ 6 │ 10000059518 │
│ 2 │ 10000059518 │
└─────┴────────────────────┘
(6 rows)
citus_stat_activity
View: Shows the information from pg_stat_activity
from all client sessions across all nodes, along with a global_pid
(or gpid
). It includes all the internal backends as well. This change is introduced via #5731.SELECT global_pid, query, state from citus_stat_activity WHERE query ilike 'INSER%' and global_pid != citus_backend_gpid();
┌─────────────┬────────────────────────────────────────────────────────────────────────────────┬─────────────────────┐
│ global_pid │ query │ state │
├─────────────┼────────────────────────────────────────────────────────────────────────────────┼─────────────────────┤
│ 80000062731 │ INSERT INTO distributed_table VALUES (1, '1'); │ idle in transaction │
│ 80000062731 │ INSERT INTO public.distributed_table_102009 (key, value) VALUES (1, '1'::text) │ idle in transaction │
└─────────────┴────────────────────────────────────────────────────────────────────────────────┴─────────────────────┘
(2 rows)
citus_dist_stat_activity
View: Similar to the citus_stat_activity
but excludes the internal backends. Most of the time, this is what you would probably need. This change is introduced via #5760.SELECT global_pid, query, state from citus_dist_stat_activity WHERE query ilike 'INSER%' and global_pid != citus_backend_gpid();
┌─────────────┬────────────────────────────────────────────────┬─────────────────────┐
│ global_pid │ query │ state │
├─────────────┼────────────────────────────────────────────────┼─────────────────────┤
│ 80000062731 │ INSERT INTO distributed_table VALUES (1, '1'); │ idle in transaction │
└─────────────┴────────────────────────────────────────────────┴─────────────────────┘
(1 row)
citus_lock_waits
View: In single node Postgres, there is a useful query which shows blocking/blocked activities. At Citus, we had already provided citus_lock_waits
which gives the same information, but considering all the activity across the cluster. With Citus 11.0, we improved this view to show any blocking activitiy and with gpid
s attached. This change is introduced via #5766.SELECT * FROM citus_lock_waits;
┌─[ RECORD 1 ]──────────────────────────┬────────────────────────────────────────────────┐
│ waiting_gpid │ 10000062689 │
│ blocking_gpid │ 80000062731 │
│ blocked_statement │ TRUNCATE distributed_table; │
│ current_statement_in_blocking_process │ INSERT INTO distributed_table VALUES (1, '1'); │
│ waiting_nodeid │ 1 │
│ blocking_nodeid │ 8 │
└───────────────────────────────────────┴────────────────────────────────────────────────┘
Time: 76.163 ms
pg_cancel_backend()
and pg_terminate_backend()
: Postgres allows cancelling and/or terminating processes with a pid
. On a distributed Citus cluster, we want to preserve the same ability with as little effort as possible. Given that we have global pid
s, we overriden these UDFs to accept gpid
s. For simplicty, below we show using the current backends gpid
with the UDFs. In general, you can pass gpid
s from any node.-- cancel own backend via gpid
SELECT pg_cancel_backend(citus_backend_gpid());
ERROR: canceling statement due to user request
-- terminatee own backend via global_pid
SELECT pg_terminate_backend(citus_backend_gpid());
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
These changes are introduced in Pull Request #5699.
With Citus 11.0, users can connect to any node to query their distributed tables. We are getting reports from users who are understandbly confused by seeing Citus-generated shards in their schemas next to their distributed tables when inspecting the database with tools like pgAdmin.
To avoid the confusion, Citus 11.0 by default hides all the shards from all external applications. This is controlled by a GUC citus.show_shards_for_app_name_prefixes
.
This change is introduced via Pull request #5567 and Pull request #5920.
-- empty string means that show shards to all applications
show citus.show_shards_for_app_name_prefixes;
┌──────────────────────────────────────────┐
│ citus.show_shards_for_app_name_prefixes │
├──────────────────────────────────────────┤
│ │
└──────────────────────────────────────────┘
(1 row)
\d
List of relations
┌────────┬───────────────────┬───────┬─────────────┐
│ Schema │ Name │ Type │ Owner │
├────────┼───────────────────┼───────┼─────────────┤
│ public │ citus_tables │ view │ onderkalaci │
│ public │ distributed_table │ table │ onderkalaci │
└────────┴───────────────────┴───────┴─────────────┘
(2 rows)
-- now configure the GUC so that it only hides shards from a specific application
-- and then, on psql -- where we run the test -- we can see the shards
SET citus.show_shards_for_app_name_prefixes TO 'psql';
\d
List of relations
┌────────┬──────────────────────────┬───────┬─────────────┐
│ Schema │ Name │ Type │ Owner │
├────────┼──────────────────────────┼───────┼─────────────┤
│ public │ citus_tables │ view │ onderkalaci │
│ public │ distributed_table │ table │ onderkalaci │
│ public │ distributed_table_102008 │ table │ onderkalaci │
│ public │ distributed_table_102015 │ table │ onderkalaci │
│ public │ distributed_table_102022 │ table │ onderkalaci │
│ public │ distributed_table_102029 │ table │ onderkalaci │
│ public │ distributed_table_102036 │ table │ onderkalaci │
└────────┴──────────────────────────┴───────┴─────────────┘
(7 rows)
Note that, if you want to make all the shards visible, you can do that via citus.override_table_visibility
:
-- do not hide any tables (a.k.a., shards)
SET citus.override_table_visibility TO off;
\d
List of relations
┌────────┬──────────────────────────┬───────┬─────────────┐
│ Schema │ Name │ Type │ Owner │
├────────┼──────────────────────────┼───────┼─────────────┤
│ public │ citus_tables │ view │ onderkalaci │
│ public │ distributed_table │ table │ onderkalaci │
│ public │ distributed_table_102008 │ table │ onderkalaci │
│ public │ distributed_table_102015 │ table │ onderkalaci │
│ public │ distributed_table_102022 │ table │ onderkalaci │
│ public │ distributed_table_102029 │ table │ onderkalaci │
│ public │ distributed_table_102036 │ table │ onderkalaci │
└────────┴──────────────────────────┴───────┴─────────────┘
(7 rows)
If you are familiar with Citus' run_command_on_workers()
, you know how handy it is to use. With querying from any node functionality, we added a few more helper functions that can be useful in a similar manner.
In general, if you want these functions to work as shown here, make sure that the coordinator is already added to the metadata, see citus_set_coordinator_host()
.
The new helper functions added in Citus 11.0 are run_command_on_coordinator()
via Pull request #5941, run_command_on_all_nodes()
via Pull request #5731 and citus_is_coordinator()
via Pull request #5888
Connect one of the nodes in the cluster, and experiment with the helper functions. In this example, we'll be connecting to one of the worker nodes:
-- make sure that the coordinator is added to the metadata
SELECT nodename, nodeport FROM pg_dist_node WHERE groupid = 0;
┌───────────┬──────────┐
│ nodename │ nodeport │
├───────────┼──────────┤
│ localhost │ 5432 │
└───────────┴──────────┘
(1 row)
-- show that we are on a worker node
SELECT citus_is_coordinator();
┌──────────────────────┐
│ citus_is_coordinator │
├──────────────────────┤
│ f │
└──────────────────────┘
(1 row)
-- for example, see work_mem setting on all nodes
SELECT
nodename, nodeport, result
FROM run_command_on_all_nodes($$SHOW work_mem$$) JOIN pg_dist_node USING(nodeid);
┌───────────┬──────────┬────────┐
│ nodename │ nodeport │ result │
├───────────┼──────────┼────────┤
│ localhost │ 5432 │ 4MB │
│ localhost │ 9700 │ 4MB │
│ localhost │ 9701 │ 4MB │
│ localhost │ 9702 │ 4MB │
│ localhost │ 9703 │ 4MB │
│ localhost │ 9704 │ 4MB │
│ localhost │ 9705 │ 4MB │
│ localhost │ 9706 │ 4MB │
└───────────┴──────────┴────────┘
(8 rows)
-- or, you can checek if there are any prepared transactions stored
-- on the coordinator
SELECT
result
FROM run_command_on_coordinator($$SELECT count(*) FROM pg_prepared_xacts$$);
┌────────┐
│ result │
├────────┤
│ 0 │
└────────┘
(1 row)
If you are upgrading an existing cluster to Citus 11.0, then after installing the new packages, you will need to call a function to finalize the upgrade. This has been implemented on #5556 and #5995.
-- execute on all nodes
ALTER EXTENSION citus UPDATE;
And, then only on the coordinator:
-- only on the coordinator
CALL citus_finish_citus_upgrade();
The upgrade function will make sure that all worker nodes have the right schema and metadata. It also addresses several naming issues that affected shards of partitioned tables.
If there is anything preventing metadata from being synchronized (e.g. missing permissions on worker nodes, or conflicting objects) the upgrade function will throw an error. You can still use your existing Citus database cluster via the coordinator until you address the issue and finalize the upgrade, but some new 11.0 functionality will not be available.
Starting from Citus 9.0, Citus automatically manages some of the database objects. In other words, users only need to take care of creating the objects on the coordinator, and the rest is handled by Citus. The objects are automatically created on the worker nodes.
In Citus 11.0 release, we expanded the supported objects by Citus. Before getting into the details of the new object support, let us discuss citus.create_object_propagation
GUC to control DDL creation behaviour in transactions.
In a distributed database we often need to make trade-offs between consistency, fault tolerance, parallelism, and other distributed systems aspects. Citus needs to support PostgreSQL’s interactive, multi-statement transaction blocks, which is particularly challenging in a distributed setting.
For example, Citus typically parallelizes expensive operations across shards—such as analytical queries and create_distributed_table()
over multiple connections per worker. When creating a database object, Citus propagates it to worker nodes over a single connection per worker. Combining those two operations in a single multi-statement transaction may cause issues because the parallel connections will not be able to see the object that was created over a single connection but not yet committed.
Consider a transaction block that creates a type, a table, loads data, and distributes the table:
BEGIN;
-- type creation over a single connection:
CREATE TYPE coordinates AS (x int, y int);
CREATE TABLE positions (object_id text primary key, position coordinates);
-- data loading thus goes over a single connection:
SELECT create_distributed_table(‘positions’, ‘object_id’); \COPY positions FROM ‘positions.csv’ …
Previously, Citus would defer creating the type on the worker nodes and commit it separately when doing the create_distributed_table
. This enabled the data copying in create_distributed_table
to happen in parallel. However, it also meant that the type was not always present on the Citus worker nodes, or if the transaction rolled back, it would be present only on the worker nodes. We were able to hide these inconsistencies, but eventually they could cause issues.
With Citus 11.0 the default behaviour changes to prioritize schema consistency between coordinator and worker nodes. That does have a downside: If object propagation happens after a parallel command, it can no longer be completed, as highlighted by the ERROR in the code block below:
BEGIN;
CREATE TABLE items (key text, value text);
-- parallel data loading:
SELECT create_distributed_table(‘items’, ‘key’);
\COPY items FROM ‘items.csv’;
CREATE TYPE coordinates AS (x int, y int);
ERROR: cannot run type command because there was a parallel operation on a distributed table in the transaction
DETAIL: When running command on/for a distributed type, Citus needs to perform all operations over a single connection per node to ensure consistency.
HINT: Try re-running the transaction with "SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"
If you run into this issue, there are 2 simple workarounds: * Use set citus.create_object_propagation to deferred;
to return to the old object propagation behavior * Use set citus.multi_shard_modify_mode to sequential
to disable per-node parallelism. Data load in the same transaction might be slower
This change is introduced via Pull request #5724 and Pull request #5774.
Several database objects, such as CREATE TYPE
or CREATE EXTENSION
has already been managed by Citus. In this release, we add several more objects that are automatically managed by Citus, so that you don't have to take any manual steps to manage the objects across the cluster.
Prior to Citus 11.0, Citus allowed objects in a way that could cause problems when the metadata syncing is enabled (a.k.a., Citus 11.0). In other words, when metadata syncing is enabled, we cannot afford to have any dependencies that Citus cannot sync to the nodes. With Citus 11.0, the following two cases are prevented:
Although not very commonly used, Postgres allows to create circular dependencies between objects. Syncing such objects to remote nodes are currently unsupported.
-- Show that causing circular dependency via functions and default values are not allowed
CREATE TABLE table_1_for_circ_dep(id int);
SELECT create_distributed_table('table_1_for_circ_dep','id');
CREATE OR REPLACE FUNCTION func_1_for_circ_dep(col_1 table_1_for_circ_dep)
RETURNS int LANGUAGE plpgsql AS $$ BEGIN return 1; END; $$;
CREATE TABLE table_2_for_circ_dep(id int, col_2 int default func_1_for_circ_dep(NULL::table_1_for_circ_dep));
SELECT create_distributed_table('table_2_for_circ_dep','id');
CREATE OR REPLACE FUNCTION func_2_for_circ_dep(col_3 table_2_for_circ_dep)
RETURNS int LANGUAGE plpgsql AS $$ BEGIN return 1; END; $$;
-- It should error out due to circular dependency
ALTER TABLE table_1_for_circ_dep ADD COLUMN col_2 int default func_2_for_circ_dep(NULL::table_2_for_circ_dep);
ERROR: Citus can not handle circular dependencies between distributed objects
DETAIL: "table table_1_for_circ_dep" circularly depends itself, resolve circular dependency first
Even if an object cannot be distributed by Citus, it can still reside on the coordinator as a local object. However, as soon as any such object is used as a dependency of a distributed object, Citus throws an error.
-- see https://github.com/citusdata/citus/blob/76f249a05a3b2d5a77fd79e23fd62fa85bb49b2c/src/test/regress/expected/pg13.out#L143-L167
-- to reproduce this example. This is only a simplified version. The main point to keep in mind is that
-- my_table cannot be distributed
-- myvarchar is a shell type that Citus does not support
CREATE TYPE myvarchar;
....
-- create a table that depends on unsupported object (e.g., myvarchar)
CREATE TABLE my_table (a int, b myvarchar);
SELECT create_distributed_table('my_table', 'a');
ERROR: Relation "my_table" has dependency on unsupported object "type myvarchar”
With the release of Citus 10.0, we enabled foreign keys between the reference tables and local Postgres tables. Also, we enabled joins between local and distributed tables.
Under the hood, Citus can convert local Postgres tables on the coordinator to "citus managed local tables". This is nothing but converting the Postgres table to a single shard Citus table sitting on the coordinator, and record the table on the Citus metadata. With Citus 11.0, as the metadata is synced, "citus managed local tables" are also synced. Meaning that, those tables can be accessed / used in the queries via the worker nodes as well, although the data sits on the coordinator.
Prior to Citus 11.0, this feature was automatically handled by Citus when a foreign key is formed between a reference table and local Postgres tables. With Citus 11.0, as this concept is also useful for accessing these tables on the worker nodes, we added some explicit ways of enabling the feature.
To do that, the pre-requisite is to add the coordinator to the metadata.
-- add the coordinator to the metadata
-- make sure to give the correct hostname / port of your enviroment
SELECT citus_set_coordinator_host('10.0.0.1', 5432);
Later, Citus 11.0 offers two seperate ways to convert local tabes to citus managed tables.
First, all new local tables are automatically added to the metadata with Pull request #5440:
-- change use_citus_managed_tables permenantly
ALTER SYSTEM SET citus.use_citus_managed_tables TO true;
SELECT pg_reload_conf();
-- create a local table, and see that it is on the Citus metadata
CREATE TABLE local(a int);
SELECT * FROM pg_dist_partition WHERE logicalrelid = 'local'::regclass;
┌──────────────┬────────────┬─────────┬──────────────┬──────────┬───────────────┐
│ logicalrelid │ partmethod │ partkey │ colocationid │ repmodel │ autoconverted │
├──────────────┼────────────┼─────────┼──────────────┼──────────┼───────────────┤
│ local │ n │ │ 0 │ s │ f │
└──────────────┴────────────┴─────────┴──────────────┴──────────┴───────────────┘
(1 row)
Second, you can selectively decide which local tables to become Citus managed tables via citus_add_local_table_to_metadata
:
-- create a local table, and see that it is on the Citus metadata
CREATE TABLE local(a int);
select citus_add_local_table_to_metadata('local');
┌───────────────────────────────────┐
│ citus_add_local_table_to_metadata │
├───────────────────────────────────┤
│ │
└───────────────────────────────────┘
(1 row)
SELECT * FROM pg_dist_partition WHERE logicalrelid = 'local'::regclass;
┌──────────────┬────────────┬─────────┬──────────────┬──────────┬───────────────┐
│ logicalrelid │ partmethod │ partkey │ colocationid │ repmodel │ autoconverted │
├──────────────┼────────────┼─────────┼──────────────┼──────────┼───────────────┤
│ local │ n │ │ 0 │ s │ f │
└──────────────┴────────────┴─────────┴──────────────┴──────────┴───────────────┘
(1 row)
In any case, if you connect to the worker nodes, you can still access to the local tables.
# 9700 is a worker node port
psql postgres -p 9700 -c "SELECT count(*) FROM local"
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)
In general, our design policy for "citus managed local tables" is that they become indistinguishable from Postgres tables. In this release, we added support for partitioning. We track all the remaining features on this issue.
Like PostgreSQL, Citus maintains long-term backwards compatibility. We go through great lengths to make sure your applications continue working when upgrading Citus. However, sometimes a feature is no longer in line with how Citus is used and gets in the way of development. There are a few Citus features which we decided to remove in 11.0:
Drops support for PostgreSQL 12: Our soft policy for Postgres version compatibilty is to support Citus' latest release with Postgres 2 latest releases. In other words, Citus 11.0 supports Postgres 13 and Postgres 14.
Invalid shard placements: Shards are no longer marked invalid when a write fails, since this behavior had some flaws and leads limitations when metadata synced. This work involved several changes:
Append-distributed table functions: The original method of distribution in Citus was “append” distribution, which was optimized for append-only data. Hash-distributed tables are a lot easier to use and have many more capabilities and can also handle append-only data very well through partitioning. Citus 11.0 removes the functions for creating shards and loading new data into append-distributed tables. We are not aware of any append-distributed table users, but just in case: you can still upgrade to 11.0, but the tables will become read only. We recommend creating new distributed tables that use the default hash-distribution.
Distributed cstore_fdw
tables (should switch to columnar access method) Since version 10.0, Citus comes with built-in columnar storage. Prior to Citus 10.0, it was possible to use Citus with columnar storage using the now deprecated cstore_fdw extension.
However, cstore_fdw
did not support important PostgreSQL features like streaming replication and backups, hence we rarely saw Citus customers use columnar storage before Citus 10. Many companies are now successfully using Citus’ built-in columnar storage for time series data, and we are therefore dropping support for creating or using distributed cstore_fdw
tables. If you have distributed cstore_fdw
tables, we recommend converting them to the columnar access method before upgrading to 11.0.
This change is introduced via #5465
postgres_fdw
—this is very hard relationship to maintain. For example, how should the cluster be adjusted after a rebalance? This needs user's involvement.In order to simplify the user experience, we dropped support for foreign distributed or reference tables via Pull request #5605. Instead, we enabled "citus managed local tables" to become foreign tables that can be accessed from any node via Pull request #5461. Previously, TRUNCATE
command was also not supported. As of Citus 11.0, Citus supports [TRUNCATE
] on foreign tables via Pull request #5710.
Note that you can still upgrade the cluster with existing distributed or reference foreign tables, but you cannot create new such tables.
pg_dist_object
to pg_catalog
: Historically pg_dist_object
had been created in the citus
schema as an experiment to understand if we could move our catalog tables to a branded schema. We quickly realised that this interfered with the UX on our managed services and other environments, where users connected via a user with the name of citus
.By default postgres put the username on the search_path. To be able to read the catalog in the citus
schema we would need to grant access permissions to the schema. This caused newly created objects like tables etc, to default to this schema for creation. This failed due to the write permissions to that schema.
With this change we move the pg_dist_object
catalog table to the pg_catalog
schema, where our other schema's are also located. This makes the catalog table visible and readable by any user, like our other catalog tables, for debugging purposes.
This change introduced via Pull request #5765.
citus.enable_cte_inlining
GUC, which is only useful for Postgres 11citus.enable_object_propagation
GUC with citus.enable_metadata_sync
, mostly a cosmetic change to have a broader namecitus_add_node()
and citus_activate_node()
UDFs, you could have given certain privileges for non-superusers to add/activate nodes. For simplicity, always require superuser