POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Updates to this version:
Welcome to release v10.1 of the Citus database, an open source extension to PostgreSQL. Explore the Citus 10.1 fixes to:
If you want the itemized list of all the improvements in 10.1, then take a peek at the Changelog for Citus 10.1.0 over in our GitHub repo.
In 10.1, the shard rebalancer—which you know we open sourced in Citus 10, hurray—is now optimized for scenarios with many tables and/or lots of Postgres partitions. And the rebalancer is now smarter about ensuring that the target node is OK to move new shards. Plus it works better with high traffic workloads (reads) running concurrently to the shard moves.
I know that Jelte Fennema plans on writing a deeper dive into some of the shard rebalancer improvements in Citus 10.1… so I’ll leave it to Jelte to go deeper in an upcoming post. Suffice it to say, there were lots of shard rebalancer improvements in 10.1, including:
citus.defer_drop_after_shard_move
by defaultimprovement_threshold
at shard rebalancer movesDropMarkedShards
get_rebalance_progress
DropMarkedShards
When you want to ingest data into Postgres, you have 3 different ways to do so:
In 10.1, Citus now uses less memory for multi-row INSERTs. Specifically, peak memory usage is reduced by up to 50%. Hence it is now faster to load data into Citus with multi-row INSERTs. Yay.
If you use Citus to scale out Postgres and you’ve ever encountered these harmless—but annoying—log messages on your worker nodes, you’ll be glad to hear that we’ve updated the Citus adaptive executor in 10.1 so that these messages won’t fill up your logs anymore.
2020-11-16 21:09:09.800 CET [16633] LOG: could not accept SSL connection: Connection reset by peer
2020-11-16 21:09:09.872 CET [16657] LOG: could not accept SSL connection: Undefined error: 0
2020-11-16 21:09:09.894 CET [16667] LOG: could not accept SSL connection: Connection reset by peer
It turns out that when we rewrote the Citus adaptive executor, one side-effect of those changes was that we would terminate no-longer-needed connections, even if they were still in the process of being established. You see, the Citus adaptive executor has a slow-start algorithm and establishes connections every 10ms (by default). As soon as Citus determined that you no longer needed the connections, the Citus executor would terminate them—triggering these noisy log messages.
So in 10.1, we improved the adaptive executor’s decision-making process. Making it so these noisy log messages don’t happen.
Of course, if you use PgBouncer with Citus to manage connection pooling, you’re probably thinking “huh”? With PgBouncer, you would never have seen these log messages—so you’re unaffected by this 10.1 improvement.
And there’s a performance gain too. The improved decision-making—where we factor in the connection-establishment times into our decision logic—gives you significant improvement for workloads where the concurrency is high (>30 concurrent queries) AND all queries hit multiple shards AND individual query execution times are < 10ms. This scenario is common if the queries are doing index scans that are not on the distribution column.
The chart below shows the performance gains—as measured by TPS, aka transactions per second—for this high-concurrency scenario. We followed these steps to simulate high-concurrency workloads and measure this performance, >>>insert footnote>>>which were run on a 4-node Hyperscale (Citus) cluster running on Azure. Each worker node had 4 cpu vCores, 32GB memory, and 0.5TB disk.<<<<<
# concurrent connections running on Citus coordinator | Citus 10.0 TPS | Citus 10.1 TPS | Citus 10.1 is |
---|---|---|---|
32 | 69.34 | 463.62 | 6.7X faster |
128 | 157.62 | 691.67 | 4.4X faster |
When there are lots of shards in a Citus cluster, this citus_shards
view could sometimes take longer than you’d like to respond queries. Now, it is oh so much faster.
This view is relevant if you find yourself wanting to see the sizes of individual shards, you’re probably familiar with the citus_shards
view.
For example, if your application is a SaaS application that is multi-tenant, you may want to see if a shard is larger than others, to understand if some of your tenants are much bigger than others:
-- for each distributed table in the cluster
-- find the min,max and avg. shard sizes
SELECT TABLE_NAME,
pg_size_pretty(min(shard_size)) AS min_shard_size,
pg_size_pretty(max(shard_size)) AS max_shard_size,
pg_size_pretty(avg(shard_size)) AS avg_size_shard
FROM citus_shards
GROUP BY TABLE_NAME
ORDER BY TABLE_NAME;
table_name | min_shard_size | max_shard_size | avg_size_shard
------------+----------------+----------------+---------------
tenant_data | 11 MB | 357 MB | 52 MB
(1 row)
This citus_shards
view is also a useful way to see if your data is balanced well or not. If you observe any imbalances in the total data sizes across the nodes, you can kick in the Citus shard rebalancer to re-distribute the data based on the disk size with rebalance_strategy:= 'by_disk_size'
.
SELECT nodename,
nodeport,
pg_size_pretty(sum(shard_size))
FROM citus_shards
WHERE citus_table_type = 'distributed'
GROUP BY nodename, nodeport;
nodename | nodeport | pg_size_pretty
------------------------------------------------+----------+---------------
private-w1.columnar.postgres.database.azure.com | 5432 | 521 MB
private-w0.columnar.postgres.database.azure.com | 5432 | 175 MB
(2 rows)
The VALUES
clause inside a CTE is a handy struct that several of you probably rely on. Before 10.1, Citus was always materializing such CTEs. With Citus 10.1, now such CTEs are inlined by default… hence “pushed down” to the worker nodes. (You’ll hear us talk about pushdown queries a lot in Citus.)
Why should you care? This improvement is useful for two reasons:
So, if you write a SQL query like this, using the VALUES
clause:
WITH tmpUserEvents (userid, category) AS (VALUES (10, 3), (11, 3))
SELECT count(*)
FROM events_table
WHERE (user_id, category) IN (SELECT user_id, category FROM tmpUserEvents);
Then, under the covers, as of 10.1, Citus now will inline the CTE as follows (we simplified this query a bit for the sake of the example):
SELECT count(*)
FROM events_table
WHERE (user_id, category) IN ((VALUES (10, 3), (11, 3)));
As with all CTEs, the default behavior can be overridden by the MATERIALIZED keyword to force the planner to materialize the results and use pull-push execution.
By their very nature, partitioned tables in Postgres tend to have really long names, such as some_descriptive_name_2020_01_01
. When Citus is involved, the names become even longer since Citus automatically adds shard identifiers to the table names: some_descriptive_name_2020_01_01_102008
.
However, PostgreSQL only supports table names up to 63 chars, and truncates the rest:
CREATE TABLE taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_chars(key int);
NOTICE: identifier "taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_chars" will be truncated to "taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_cha"
CREATE TABLE
The good news: in Citus 10.1 we have removed the length limits around partition names. Citus had several edge cases around naming of distributed tables or indexes with really long names, and Citus 10.1 fixes all of these issues and users can pick as long table names as they wish.
So the issues that are now fixed:
If you’re using Citus to shard Postgres on a single node and you connect to the Citus coordinator with psql and then use \d
to show relations, Citus is smart enough to not show the shards in the \d
output.
But if you use a different interactive tool and you’ve sharded Citus on a single node, you might be able to see the shards on the coordinator—and you might find that confusing. Especially if you’re new to Citus, and especially if you’re not expecting to see both the shards and the distributed tables on the same node. This happened to a Citus user recently… who then, thinking these shards did not belong, tried to drop the shards. And as you know, dropping shards unintentionally—just like accidentally dropping tables—can create some problems. (That’s an understatement.)
This fix in Citus 10.1 will prevent you from accidentally dropping (or truncating) Citus shards on the coordinator.
Prior to 10.1, you already had a GUC you could use to change the shard count on your tables. It is called citus.shard_count
which is a setting for Citus tables. But it was a bit awkward to set a GUC to change the shard count. So we’ve made things easier.
As of Citus 10.1, you can set the (optional) shard_count
parameter when using the create_distributed_table
function to distribute your table—whether you are sharding across a single Citus node or across a distributed cluster. The new shard_count
parameter is simple to play with and makes it easier to test with different shard counts when experimenting with Citus. The default citus.shard_count is 32 if you do not change it nor set the parameter.
If you’ve already read Ozan’s “Citus Tips for Postgres” post about how to alter the distribution key and shard count, you know that shard_count
was already a parameter for alter_distributed_table
. And with 10.1, shard_count
is now a parameter for create_distributed_table
, too.
Special thanks to Onder Kalaci for his suggestions, reviews, examples, and collaboration in creating this 10.1 release notes post.