POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Halil Ozan Akgul
July 21, 2022
We released Citus 11 in the previous weeks and it is packed. Citus went full open source, so now previously enterprise features like the non-blocking aspect of the shard rebalancer—and multi-user support—are all open source for everyone to enjoy. One other huge change in Citus 11 is now you can query your distributed Postgres tables from any Citus node, by default.
When using Citus to distribute Postgres before Citus 11, the coordinator node was your application’s only point of contact. Your application needed to connect to the coordinator to query your distributed Postgres tables. Coordinator node can handle high query throughput, about 100K per second but your application might need even more processing power. Thanks to our work in Citus 11 you can now query from any node in the Citus database cluster you want. In Citus 11 we sync the metadata to all nodes by default, so you can connect to any node and run queries on your tables.
Running queries from any node is awesome but you also need to be able to monitor and manage your queries from any node. Before, when you only connected the coordinator, using Postgres’ monitoring tools was enough but this is not the case anymore. So in Citus 11 we added some ways to observe your queries similar to you would do in a single Postgres instance.
In this blogpost you’ll learn about some new monitoring tools introduced in Citus 11 that’ll help you track and take control of your distributed queries, including:
You can now use the Citus global process id, global PID for short, which is new to Citus 11. The Global PID is just like Postgres’ process id, but this new value is unique across a Citus cluster. We call the new value global process identifier and we use global PID or GPID for short.
To find the global PID of your current backend, you can use our new citus_backend_gpid function:
SELECT citus_backend_gpid();
citus_backend_gpid
--------------------
110000000123
We tried to make GPIDs human readable, so they consist of the Postgres PID of your current backend and the node id of your current node. Smallest 10 digits of global PID are the Postgres process id—which you can find with pg_backend_pid
function—and rest of the digits are the node id, which you can find in the pg_dist_node table.
Global PIDs are unique in a Citus cluster. Also, remember that a distributed query might need to run some queries on the shards. Those shard query executions also get the same GPID. In other words, all the activity of a distributed query can be traced via the same GPID. For example, if you run a SELECT
query that has the GPID 110000000123
, the queries that will SELECT
from the shards will also have 110000000123
as global PID.
Note that global PIDs are big integers where Postgres PIDs are 4 byte integers.
To find the Citus global PIDs and more information about your Postgres queries in a Citus cluster, you can use our new citus_stat_activity view. citus_stat_activity
is a collection of pg_stat_activity views from all nodes in the Citus cluster. When you query the citus_stat_activity
it goes to every node and gathers pg_stat_activity
views. citus_stat_activity
includes all the columns from pg_stat_activity
and we added three extra columns:
citus_stat_activity
row comes fromLet’s say you have a distributed table tbl
with 4 shards and you run an update query on it in a node:
BEGIN;
UPDATE tbl SET b = 100 WHERE a = 0;
You can connect to any node and use citus_stat_activity to find info about the UPDATE
query.
SELECT global_pid, nodeid, is_worker_query, query
FROM citus_stat_activity
WHERE global_pid = 110000000123;
-[ RECORD 1 ]---+----------------------------------------------------------------------------
global_pid | 110000000123
nodeid | 11
is_worker_query | f
query | UPDATE tbl SET b = 100 WHERE a = 0;
-[ RECORD 2 ]---+----------------------------------------------------------------------------
global_pid | 110000000123
nodeid | 2
is_worker_query | t
query | UPDATE public.tbl_102009 tbl SET b = 100 WHERE (a OPERATOR(pg_catalog.=) 0)
In the output above, you can see:
nodeid
11nodeid
2global_pid
is_worker_query
column is true
for record 2 and false
for the original query, record 1.Don’t forget, citus_stat_activity
includes all the columns of pg_stat_activity
, not just the ones we filter for demonstrating here. So, you can find much more information in citus_stat_activity
view.
If you are not interested in every single query from all the nodes in the Citus cluster and only care about the original distributed queries you can use citus_dist_stat_activity view.
citus_dist_stat_activity
hides the queries that run on the shards from the citus_stat_activity
view, so you can find some high level information about your Postgres queries.
SELECT global_pid, nodeid, is_worker_query, query
FROM citus_dist_stat_activity
WHERE global_pid = 110000000123;
-[ RECORD 1 ]---+--------------------------
global_pid | 110000000123
nodeid | 11
is_worker_query | f
query | UPDATE tbl SET b = 100 WHERE a = 0;
As, you might have guessed citus_dist_stat_activity
is citus_stat_activity
filtered with is_worker_query = false
. We created citus_dist_stat_activity
because when you are interested in the process as a whole and not each and every process on the shards, then the general information that citus_dist_stat_activity
provides about the initial queries should be enough.
When something in your Postgres database is blocked you are in the need of monitoring the most. Citus 11 has you covered when your cluster is blocked too. The newly updated citus_lock_waits shows the queries in your cluster that are waiting for some lock on another query.
Let’s say you run a DELETE
query on the tbl
that will be blocked on the previous UPDATE
query:
DELETE FROM tbl WHERE a = 0;
You can connect to any node and use citus_lock_waits
to find out which query is blocking your new query:
SELECT * FROM citus_lock_waits;
-[ RECORD 1 ]-------------------------+-----------------------------
waiting_gpid | 20000000345
blocking_gpid | 110000000123
blocked_statement | DELETE FROM tbl WHERE a = 0;
current_statement_in_blocking_process | UPDATE tbl SET b = 100 WHERE a = 0;
waiting_nodeid | 2
blocking_nodeid | 11
The result above shows the UPDATE
statement is blocking the DELETE
statement.
Once you find the blocking queries you can use citus_stat_activity
and citus_dist_stat_activity
with the global PIDs from citus_lock_waits to gather more insight.
After you find out and get more information about the blocking and blocked queries you might decide you need to cancel one of them. Before Citus 11 you needed to go to the node that the query is being run on, and then use pg_cancel_backend
with the process id to cancel.
Now in Citus 11 we override the pg_cancel_backend function to accept global PIDs too.
So, good news, things are now easier, you can cancel queries on your Citus clusters from any node:
SELECT pg_cancel_backend(20000000345);
will cause:
DELETE FROM tbl WHERE a = 0;
ERROR: canceling statement due to user request
Remember that global PIDs are always big integers and Postgres PIDs are 4-byte integers. The difference in size is how pg_cancel_backend
differentiates between a PID and a GPID.
Also, like pg_cancel_backend
, Citus 11 overrides pg_terminate_backend
to accept global PIDs too. So, you can also terminate queries from different nodes using global PIDs.
In addition to all the Citus activity and lock views mentioned, we added some more smaller functions to help you monitor your database cluster. The new functions try to make it easier for you to get some info that can be useful when writing monitoring queries, including:
You can use citus_nodename_for_nodeid
and citus_nodeport_for_nodeid
to get info about the node with a node id:
SELECT citus_nodename_for_nodeid(11);
citus_nodename_for_nodeid
---------------------------
localhost
SELECT citus_nodeport_for_nodeid(11);
citus_nodeport_for_nodeid
---------------------------
9701
You can find both info above in the pg_dist_node table too.
You can use citus_nodeid_for_gpid
and citus_pid_for_gpid
to parse a GPID.
SELECT citus_nodeid_for_gpid(110000000123);
citus_nodeid_for_gpid
-----------------------
11
SELECT citus_pid_for_gpid(110000000123);
citus_pid_for_gpid
--------------------
123
As I mentioned earlier, we tried to make GPIDs human readable and with the two functions above they are also easily machine readable too.
With the functions above you can find out about a node from a GPID like this:
SELECT citus_nodename_for_nodeid(citus_nodeid_for_gpid(110000000123)),
citus_nodeport_for_nodeid(citus_nodeid_for_gpid(110000000123));
citus_nodename_for_nodeid | citus_nodeport_for_nodeid
---------------------------+---------------------------
localhost | 9701
(1 row)
Finally, you can use citus_coordinator_nodeid
to find the node id of the coordinator node.
SELECT citus_coordinator_nodeid();
citus_coordinator_nodeid
--------------------------
3
With Citus 11 you can query your distributed Postgres tables from any node by default. And with the tools you learned about in this blog post, you know how to monitor and manage your Citus cluster from any node, like you would do with a single Postgres instance.
If you’re interested in all that changed in Citus 11 check out:
And if you want to download Citus you can always find the latest download instructions on the website; we’ve put together lots of useful resources on the Getting Started page; you can file issues in the GitHub repo; and if you have questions please join us (and other users in the community) in the Citus Public Slack.