Citus 7.1: Window functions, distinct, distributed transactions, more

So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.

If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:

Distributed transaction support
Zero-downtime shard rebalancer
Window function enhancements
Distinct ON/count(distinct) enhancements
Additional SQL enhancements
Checking for new software updates

For all these new features in Citus 7.1, you can try them today on Citus Cloud, get started locally with our Citus open source downloads, or keep right on reading to learn more about all the new Citus things.

Distributed transactions in the Citus database

The new Citus 7.1 distributed transactions feature is so important to our customers that we dedicated an entire blog post to it. As well as a blog on distributed deadlocks, an important pre-requisite feature we had to deliver in Citus 7.0, before tackling the distributed transaction challenge in Citus 7.1.

Zero-downtime Shard Rebalancer

We already covered the new zero-downtime shard rebalancer in the previous Citus Cloud 2 post, so we won’t dig in too much here. Just in case you missed the Citus Cloud 2 launch, bottom line, the Citus Cloud shard rebalancer now offers a fully online operation, with zero-downtime. To implement the zero-downtime shard rebalancer, we extended Postgres 10 logical replication support. Oh, and to make it easy to observe progress during the rebalancer operation, we created a new UI, which we hope you find useful:

So today we’re excited to announce Citus Cloud 2 w/ sweet new features for our cloud database | Worry-free Postgres https://t.co/vawtGRkmi9 pic.twitter.com/meFjnDT3kt
— Citus Data (@citusdata) November 16, 2017

Cross-tenant support for Windows Functions in Citus

In SQL, a window function performs a computation across a set of table rows that are somehow related to the current row. This is comparable to the type of computation that can be done with an aggregate function (sum, count, or avg). However, window functions don’t cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities.

You can read more about window functions or watch Bruce Momjian’s excellent talk on this topic.

In Citus, our customers leverage window functions to analyze their users’ actions over time. For example, a funnel query could ask: Show me users who visited web pages on a website, where the time interval between visits was less than half an hour.

SELECT 
    *
FROM 
    (
    SELECT 
        user_id, 
        time,
        lag(time, 1) over w1 as prev_time 
    FROM 
        web_page_visits 
    GROUP BY 
        user_id, webpage, time
    WINDOW w1 as (PARTITION BY user_id, webpage ORDER BY time)
    ) a
WHERE 
    time-prev_time < INTERVAL '30 minutes' 
ORDER BY 1;

Citus previously supported window functions when the computation targeted a single tenant. Citus 7.1 extends its window function support to include computations that span across tenants or users, where the window function’s PARTITION BY clause includes the distribution column.

DISTINCT / DISTINCT ON / count(distinct) Improvements

The DISTINCT clause removes all duplicate rows from the result set and keeps one row for each group of duplicates. The DISTINCT ON clause keeps only the first row of each set of rows where the given expressions evaluate to equal. You can read more about DISTINCT clauses in PostgreSQL’s documentation.

In Citus, our users run DISTINCT and DISTINCT ON clauses when creating roll-up tables, removing duplicates, or finding the first or last event associated with an entity. Citus 7.1 adds support for most queries that include distinct clauses. In particular, the following syntax is now supported:

SELECT DISTINCT column_1, column_2, … FROM [distributed table | reference table];
SELECT DISTINCT ON (distribution_column), other_columns FROM [distributed table];
SELECT DISTINCT ON column_1, column_2, … FROM [distributed table | reference table];
SELECT DISTINCT [subquery_target_columns] FROM (subquery);

Another frequent action is counting the number of unique occurrences of a column or expression in a data set. Citus 7.1 provides extends its support for exact count(distinct) clauses on distributed and reference tables, where the distinct expression doesn’t need to include the distribution column.

SQL coverage for subqueries with reference tables, plus performance optimizations too

Citus 7.1 expands SQL support for subqueries that include reference tables. Previously, subqueries were supported when you join distributed tables with reference tables on the distribution column. Citus 7.1 now provides support for join clauses on any column or expression.

Also, Citus 7.1 introduces performance optimizations for SELECT queries that include ORDER BY and LIMIT clause on the distribution column. In particular, queries that have the following structure now run much faster in Citus 7.1:

SELECT l_orderkey, count(DISTINCT l_partkey)
  FROM lineitem
  GROUP BY l_orderkey
  ORDER BY 2 DESC LIMIT 10;

Keeping your Citus up to date

We ship new releases of Citus roughly every three months. This means every three months there are new awesome Citus features you may want to take advantage of. In Citus 7.1, we’ve added a feature that, unless you opt out, will automatically notify you of recent product updates, with messages like this in your database logs: a new minor release of Citus (X.Y.Z) is available. This new 7.1 feature also sends anonymized database cluster information to our product team so they can keep making the product better. The anonymized data is extremely useful in understanding cluster sizes and usage patterns and scaling patterns, so that we can continue to prioritize the features you need the most in our gridding exercises.

Give our worry-free, scale-out, open source Postgres a try & let us know what you think

The team at Citus is super excited to see Citus 7.1 out in the wild, helping developers like you to never have to worry about scaling again. (Or at least, to worry a heck of a lot less.) As always, Citus is available for open source downloads, as enterprise on-prem software, and via Citus Cloud, our fully-managed database as a service. If you give Citus a spin. we’d love your feedback. And if you have feature requests, we’re all ears. Feel free to join the conversation in our slack channel.

Written by Ozgun Erdogan

Co-founder & CTO of Citus Data. Former Postgres engineering team director at Microsoft. Worked on distributed systems at Amazon. Speaker at PGCon, XLDB Conf, DataEngConf, PostgresOpen, & QCon. Dad.