v10.1 (Jul 2021)

Updates to this version:

What’s new in Citus 10.1?

Welcome to release v10.1 of the Citus extension to PostgreSQL. Explore the Citus 10.1 fixes to:

If you want the itemized list of all the improvements in 10.1, then take a peek at the Changelog for Citus 10.1.0 over in our GitHub repo.

Shard rebalancer optimizations

In 10.1, the shard rebalancer—which you know we open sourced in Citus 10, hurray—is now optimized for scenarios with many tables and/or lots of Postgres partitions. And the rebalancer is now smarter about ensuring that the target node is OK to move new shards. Plus it works better with high traffic workloads (reads) running concurrently to the shard moves.

I know that Jelte Fennema plans on writing a deeper dive into some of the shard rebalancer improvements in Citus 10.1… so I’ll leave it to Jelte to go deeper in an upcoming post. Suffice it to say, there were lots of shard rebalancer improvements in 10.1, including:

  • Reduces memory usage while rebalancing shards
  • Enables citus.defer_drop_after_shard_move by default
  • Ensures free disk space before moving a shard
  • Makes sure that target node in shard moves is eligible for shard move
  • Optimizes partitioned disk size calculation for shard rebalancer
  • Implements improvement_threshold at shard rebalancer moves
  • Improves orphaned shard cleanup logic
  • Continues to remove shards after failure in DropMarkedShards
  • Makes sure connection is closed after each shard move
  • Fetches shard size on the fly for the rebalance monitor
  • Fixes two race conditions in the get_rebalance_progress
  • Fixes problems with concurrent calls of DropMarkedShards
  • Fixes a bug that allowed moving of shards belonging to a reference table
  • Fixes the relation size bug during rebalancing

Reduce memory usage for multi-row INSERTs

When you want to ingest data into Postgres, you have 3 different ways to do so:

  • Single-row insert—you just insert row by row, repeating the INSERT command
  • Postgres COPY utility—this is the fastest way to ingest data into Postgres (and Citus) and we typically recommend using COPY wherever possible
  • Multi-row INSERT—some applications and ORMs generate INSERTs as multi-row INSERTs, which is just a single INSERT statement with multiple values. It’s pretty common to do multi-row INSERTs.

In 10.1, Citus now uses less memory for multi-row INSERTs. Specifically, peak memory usage is reduced by up to 50%. Hence it is now faster to load data into Citus with multi-row INSERTs. Yay.

Adaptive executor enhancements

If you use Citus to scale out Postgres and you’ve ever encountered these harmless—but annoying—log messages on your worker nodes, you’ll be glad to hear that we’ve updated the Citus adaptive executor in 10.1 so that these messages won’t fill up your logs anymore.

2020-11-16 21:09:09.800 CET [16633] LOG:  could not accept SSL connection: Connection reset by peer
2020-11-16 21:09:09.872 CET [16657] LOG:  could not accept SSL connection: Undefined error: 0
2020-11-16 21:09:09.894 CET [16667] LOG:  could not accept SSL connection: Connection reset by peer

It turns out that when we rewrote the Citus adaptive executor, one side-effect of those changes was that we would terminate no-longer-needed connections, even if they were still in the process of being established. You see, the Citus adaptive executor has a slow-start algorithm and establishes connections every 10ms (by default). As soon as Citus determined that you no longer needed the connections, the Citus executor would terminate them—triggering these noisy log messages.

So in 10.1, we improved the adaptive executor’s decision-making process. Making it so these noisy log messages don’t happen.

Of course, if you use PgBouncer with Citus to manage connection pooling, you’re probably thinking “huh”? With PgBouncer, you would never have seen these log messages—so you’re unaffected by this 10.1 improvement.

And there’s a performance gain too. The improved decision-making—where we factor in the connection-establishment times into our decision logic—gives you significant improvement for workloads where the concurrency is high (>30 concurrent queries) AND all queries hit multiple shards AND individual query execution times are < 10ms. This scenario is common if the queries are doing index scans that are not on the distribution column.

The chart below shows the performance gains—as measured by TPS, aka transactions per second—for this high-concurrency scenario. We followed these steps to simulate high-concurrency workloads and measure this performance, >>>insert footnote>>>which were run on a 4-node Hyperscale (Citus) cluster running on Azure. Each worker node had 4 cpu vCores, 32GB memory, and 0.5TB disk.<<<<<

# concurrent connections running on Citus coordinator Citus 10.0 TPS Citus 10.1 TPS Citus 10.1 is
32 69.34 463.62 6.7X faster
128 157.62 691.67 4.4X faster

Better performance for the citus_shards view

When there are lots of shards in a Citus cluster, this citus_shards view could sometimes take longer than you’d like to respond queries. Now, it is oh so much faster.

This view is relevant if you find yourself wanting to see the sizes of individual shards, you’re probably familiar with the citus_shards view.

For example, if your application is a SaaS application that is multi-tenant, you may want to see if a shard is larger than others, to understand if some of your tenants are much bigger than others:

-- for each distributed table in the cluster
-- find the min,max and avg. shard sizes
       pg_size_pretty(min(shard_size)) AS min_shard_size,
       pg_size_pretty(max(shard_size)) AS max_shard_size,
       pg_size_pretty(avg(shard_size)) AS avg_size_shard
FROM citus_shards

 table_name | min_shard_size | max_shard_size | avg_size_shard
tenant_data | 11 MB          | 357 MB         | 52 MB
(1 row)

This citus_shards view is also a useful way to see if your data is balanced well or not. If you observe any imbalances in the total data sizes across the nodes, you can kick in the Citus shard rebalancer to re-distribute the data based on the disk size with rebalance_strategy:= 'by_disk_size'.

SELECT nodename,
FROM citus_shards
WHERE citus_table_type = 'distributed'
GROUP BY nodename, nodeport;

                    nodename                    | nodeport | pg_size_pretty
private-w1.columnar.postgres.database.azure.com |     5432 | 521 MB
private-w0.columnar.postgres.database.azure.com |     5432 | 175 MB
(2 rows)

Inline VALUES in CTEs as subqueries

The VALUES clause inside a CTE is a handy struct that several of you probably rely on. Before 10.1, Citus was always materializing such CTEs. With Citus 10.1, now such CTEs are inlined by default… hence “pushed down” to the worker nodes. (You’ll hear us talk about pushdown queries a lot in Citus.)

Why should you care? This improvement is useful for two reasons:

  • Network round trips: Citus now avoids one extra network round-trip in this scenario.
  • Query performance: With some larger data sizes, more efficient query plans on the shards are chosen, hence the query execution times are decreased when the VALUES clause is pushed instead of materialized.

So, if you write a SQL query like this, using the VALUES clause:

WITH tmpUserEvents (userid, category) AS (VALUES (10, 3), (11, 3))
SELECT  count(*)
FROM    events_table
WHERE   (user_id, category) IN (SELECT user_id, category FROM tmpUserEvents);

Then, under the covers, as of 10.1, Citus now will inline the CTE as follows (we simplified this query a bit for the sake of the example):

SELECT  count(*)
FROM    events_table
WHERE   (user_id, category) IN ((VALUES (10, 3), (11, 3)));

As with all CTEs, the default behavior can be overridden by the MATERIALIZED keyword to force the planner to materialize the results and use pull-push execution.

UX improvements using Citus with Postgres partitions

By their very nature, partitioned tables in Postgres tend to have really long names, such as some_descriptive_name_2020_01_01. When Citus is involved, the names become even longer since Citus automatically adds shard identifiers to the table names: some_descriptive_name_2020_01_01_102008.

However, PostgreSQL only supports table names up to 63 chars, and truncates the rest:

CREATE TABLE taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_chars(key int);
NOTICE:  identifier "taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_chars" will be truncated to "taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_cha"

The good news: in Citus 10.1 we have removed the length limits around partition names. Citus had several edge cases around naming of distributed tables or indexes with really long names, and Citus 10.1 fixes all of these issues and users can pick as long table names as they wish.

So the issues that are now fixed:

  • Cannot rename table to have a name >56 characters in length (Issue #4211)
  • alter_table_set_access_method fails in case of long names (Issue #4665)
  • Distributed deadlock when creating index on a partitioned table (Issue #4412)
  • Self deadlock if the index name is long when the placements are on the coordinator (Issue #4238)

Protection from accidentally dropping & truncating shards

If you’re using Citus to shard Postgres on a single node and you connect to the Citus coordinator with psql and then use \d to show relations, Citus is smart enough to not show the shards in the \d output.

But if you use a different interactive tool and you’ve sharded Citus on a single node, you might be able to see the shards on the coordinator—and you might find that confusing. Especially if you’re new to Citus, and especially if you’re not expecting to see both the shards and the distributed tables on the same node. This happened to a Citus user recently… who then, thinking these shards did not belong, tried to drop the shards. And as you know, dropping shards unintentionally—just like accidentally dropping tables—can create some problems. (That’s an understatement.)

This fix in Citus 10.1 will prevent you from accidentally dropping (or truncating) Citus shards on the coordinator.

Add shard_count parameter to create_distributed_table

Prior to 10.1, you already had a GUC you could use to change the shard count on your tables. It is called citus.shard_count which is a setting for Citus tables. But it was a bit awkward to set a GUC to change the shard count. So we’ve made things easier.

As of Citus 10.1, you can set the (optional) shard_count parameter when using the create_distributed_table function to distribute your table—whether you are sharding across a single Citus node or across a distributed cluster. The new shard_count parameter is simple to play with and makes it easier to test with different shard counts when experimenting with Citus. The default citus.shard_count is 32 if you do not change it nor set the parameter.

If you’ve already read Ozan’s “Citus Tips for Postgres” post about how to alter the distribution key and shard count, you know that shard_count was already a parameter for alter_distributed_table. And with 10.1, shard_count is now a parameter for create_distributed_table, too.

Thank you

Special thanks to Onder Kalaci for his suggestions, reviews, examples, and collaboration in creating this 10.1 release notes post.