v9.4 (Jul 2020)

Updates to this version:


What’s new in Citus 9.4

Welcome to release v9.4 of the Citus extension to PostgreSQL. Here’s an overview of the new capabilities in Citus 9.4:

Understanding performance with EXPLAIN ANALYZE

The Postgres EXPLAIN ANALYZE command is essential for you to understand your query performance. In Citus 9.4, we improved how EXPLAIN ANALYZE works with Citus to give you more insights about your distributed Citus queries.

If you peek under the covers into the internals of Citus (Citus is open source after all), you’ll see that the Citus EXPLAIN ANALYZE command used to run the distributed Citus query twice. Once to get the results from the workers to build the EXPLAIN plan on the Citus coordinator and then again to get the EXPLAIN plans from the Citus worker nodes.

This two-step approach could give misleading results for workloads with frequent cache misses, since the EXPLAIN was always a cache hit.

In Citus 9.4, we combined the two steps into one to give you more accurate EXPLAIN ANALYZE results. In addition, when explaining all the tasks of a distributed Citus query, we now create the EXPLAIN plans in parallel across the cluster, which gives you EXPLAIN ANALYZE results a lot faster. The new approach also enables EXPLAIN ANALYZE for inserts and other DML commands.

We also added additional information that is relevant to Citus, including:

  • EXPLAIN now shows the number of bytes transferred over the network
  • EXPLAIN can now show the Postgres queries that are sent to and received from the worker nodes

The EXPLAIN ANALYZE improvements in Citus 9.4 make it much easier for you to understand and tune your Postgres query performance in your Citus database cluster. An example of getting an average over a large table. You can see that Citus only transfers 44 bytes over the network because it only gets the sum and the count from each shard:

EXPLAIN (ANALYZE, VERBOSE) SELECT avg(rating) FROM item_ratings;
                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=500.00..500.02 rows=1 width=8) (actual time=95.942..95.942 rows=1 loops=1)
   Output: (sum(remote_scan.avg) / (pg_catalog.sum(remote_scan.avg_1))::double precision)
   ->  Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=16) (actual time=95.917..95.918 rows=2 loops=1)
     Output: remote_scan.avg, remote_scan.avg_1
     Task Count: 2
     Tuple data received from nodes: 44 bytes
     Tasks Shown: All
     ->  Task
       Query: SELECT sum(rating) AS avg, count(rating) AS avg FROM item_ratings_102077 item_ratings WHERE true
       Tuple data received from node: 22 bytes
       Node: host=localhost port=9700 dbname=postgres
       ->  Aggregate  (cost=10872.68..10872.69 rows=1 width=16) (actual time=92.390..92.391 rows=1 loops=1)
         Output: sum(rating), count(rating)
         ->  Seq Scan on public.item_ratings_102077 item_ratings  (cost=0.00..8141.12 rows=546312 width=8) (actual time=0.120..53.408 rows=495406 loops=1)
           Output: user_id, rating
       Planning Time: 0.051 ms
       Execution Time: 92.424 ms
     ->  Task
       Query: SELECT sum(rating) AS avg, count(rating) AS avg FROM item_ratings_102078 item_ratings WHERE true
       Tuple data received from node: 22 bytes
       Node: host=localhost port=9701 dbname=postgres
       ->  Aggregate  (cost=11075.68..11075.69 rows=1 width=16) (actual time=93.468..93.468 rows=1 loops=1)
         Output: sum(rating), count(rating)
         ->  Seq Scan on public.item_ratings_102078 item_ratings  (cost=0.00..8293.12 rows=556512 width=8) (actual time=0.053..53.639 rows=504594 loops=1)
           Output: user_id, rating
       Planning Time: 0.090 ms
       Execution Time: 93.500 ms
 Planning Time: 0.370 ms
 Execution Time: 96.191 ms
(29 rows)

Making sure your app can leverage both local & distributed Citus tables

INSERT INTO local SELECT .. FROM distributed:

Inserting or upserting the result of a query into a table is a powerful data transformation mechanism in PostgreSQL and in Citus. As of Citus 9.4, you can now insert or upsert the result of a SQL query on a distributed table directly into a local table. An important use case this enables is to maintain small rollup tables that are placed on the coordinator for low latency, but it also simplifies other workflows. For instance, you can now write the result of a query on a distributed table directly to a temporary table, which is commonly needed for more complex analytics. With this feature, Citus now supports INSERT..SELECT for any combination of tables.

Joins between local tables and CTEs:

JOINs between local tables on the Citus coordinator and Common Table Expressions (CTEs) that query distributed Citus tables are now supported. That makes it easier to use hybrid data models that use a mixture of distributed, reference, and local tables.

Making sure you don’t run out of Postgres connections with COPY connection management

When running many concurrent COPY commands in the past, Citus sometimes overwhelmed worker nodes with a large number of Postgres connections.

As of Citus 9.4, you can now use the citus.max_adaptive_executor_pool_size setting to reduce the number of Postgres connections made by each individual COPY command—so you can support much higher concurrency. In the next release (Citus 9.5), we will make the connection management for COPY fully automatic.

Calculating percentiles at scale with t-digest

Aggregate functions from the t-digest extension to Postgres can now be distributed by Citus, which enables very fast, parallel percentile calculations on Citus and satisfies an important need for some of our Citus open source users. Props to Postgres committer Tomas Vondra for creating the open source t-digest extension and being so helpful in making t-digest work with Citus. The open source t-digest extension is now supported on Hyperscale (Citus), too—I’m told this docs page on Hyperscale (Citus) extension support will soon be updated to include t-digest.

Are percentile approximations useful? Yes. One of my teammates, Nils Dijk, is writing a blog post about how one of our customers improved Postgres percentile calculations by 45x with t-digest and Hyperscale (Citus). And Matt Watson from the Stackify team just wrote about how to use Postgres, t-digest, and Citus to create SQL percentile rollups at scale. Also, the Windows Data and Intelligence team—who manage a petabyte-scale analytics dashboard using Citus on Microsoft Azure—are using t-digest with Citus, too.

Performance improvements

Modifying CTE pushdown in router queries:

Modifying CTEs can be used to modify a table and then do an additional query using the modified rows. For instance, we may want to delete completed TODO items, but also lower a counter:

WITH deleted_todo AS (
  DELETE FROM todo WHERE user_id = 423 AND isdone RETURNING *
)
UPDATE users
SET remaining_todos = remainining_todos - (SELECT count(*) FROM deleted_todo)
WHERE user_id = 423;

When a query with modifying CTE has filters on the distribution column (e.g. the user ID or tenant_id), it can now be fully delegated to the worker in a single network round trip, which means these queries now run with much lower latency and higher throughput.

Optional support for binary protocol: 

In Citus 9.4, you can enable the binary PostgreSQL protocol for worker-coordinator communication.

Normally, values are sent over the wire as text, which can be very efficient for some data. Consider that a bigint with the value 9 only requires 1 byte in text, but 8 bytes in binary. On the other hand, there are also data types such as timestamps where the text format is way bigger than the binary format. For instance, PostgreSQL uses 8 bytes to store a timestamptz, but the text form (e.g. 2020-09-04 09:58:36.788916+02) is a whopping 29 bytes. When queries return millions of rows, that can be a lot of extra network traffic.

By enabling the citus.enable_binary_protocol, you can perform queries using the binary protocol, which can reduce network traffic and speed up queries for data types whose binary representation is much more compact than their text representation (e.g. timestamp, float, tdigest). The binary protocol is disabled by default because it can increase the network traffic in some cases (e.g. small integers) but will help performance especially if your queries return a lot of timestamps, floating point numbers, or other data types whose text representations is very long.

More controls for operating your distributed Citus cluster

Citus downgrade scripts: 

Previously, the Citus extension could only ever be upgraded to a new version, because the Citus upgrade script makes schema changes and those cannot be reverted. As of Citus 9.4, we have now committed to maintaining a downgrade script for each upgrade script. That means that if you upgrade and for some reason you need to revert back, you would be able to install an older version of Citus and revert the schema changes by running ALTER EXTENSION.