Citus 11 is out! Now 100% open source. Read all about it in Marco’s release blog. 💥
So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.
If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:
For all these new features in Citus 7.1, you can try them today on Citus Cloud, get started locally with our Citus open source downloads, or keep right on reading to learn more about all the new Citus things.
The new Citus 7.1 distributed transactions feature is so important to our customers that we dedicated an entire blog post to it. As well as a blog on distributed deadlocks, an important pre-requisite feature we had to deliver in Citus 7.0, before tackling the distributed transaction challenge in Citus 7.1.
We already covered the new zero-downtime shard rebalancer in the previous Citus Cloud 2 post, so we won’t dig in too much here. Just in case you missed the Citus Cloud 2 launch, bottom line, the Citus Cloud shard rebalancer now offers a fully online operation, with zero-downtime. To implement the zero-downtime shard rebalancer, we extended Postgres 10 logical replication support. Oh, and to make it easy to observe progress during the rebalancer operation, we created a new UI, which we hope you find useful:
In SQL, a window function performs a computation across a set of table rows that are somehow related to the current row. This is comparable to the type of computation that can be done with an aggregate function (sum, count, or avg). However, window functions don’t cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities.
In Citus, our customers leverage window functions to analyze their users’ actions over time. For example, a funnel query could ask: Show me users who visited web pages on a website, where the time interval between visits was less than half an hour.
SELECT * FROM ( SELECT user_id, time, lag(time, 1) over w1 as prev_time FROM web_page_visits GROUP BY user_id, webpage, time WINDOW w1 as (PARTITION BY user_id, webpage ORDER BY time) ) a WHERE time-prev_time < INTERVAL '30 minutes' ORDER BY 1;
Citus previously supported window functions when the computation targeted a single tenant. Citus 7.1 extends its window function support to include computations that span across tenants or users, where the window function’s
PARTITION BY clause includes the distribution column.
DISTINCT clause removes all duplicate rows from the result set and keeps one row for each group of duplicates. The
DISTINCT ON clause keeps only the first row of each set of rows where the given expressions evaluate to equal. You can read more about
DISTINCT clauses in PostgreSQL’s documentation.
In Citus, our users run
DISTINCT ON clauses when creating roll-up tables, removing duplicates, or finding the first or last event associated with an entity. Citus 7.1 adds support for most queries that include distinct clauses. In particular, the following syntax is now supported:
SELECT DISTINCT column_1, column_2, … FROM [distributed table | reference table];
SELECT DISTINCT ON (distribution_column), other_columns FROM [distributed table];
SELECT DISTINCT ON column_1, column_2, … FROM [distributed table | reference table];
SELECT DISTINCT [subquery_target_columns] FROM (subquery);
Another frequent action is counting the number of unique occurrences of a column or expression in a data set. Citus 7.1 provides extends its support for exact count(distinct) clauses on distributed and reference tables, where the distinct expression doesn’t need to include the distribution column.
Citus 7.1 expands SQL support for subqueries that include reference tables. Previously, subqueries were supported when you join distributed tables with reference tables on the distribution column. Citus 7.1 now provides support for join clauses on any column or expression.
Also, Citus 7.1 introduces performance optimizations for
SELECT queries that include
ORDER BY and
LIMIT clause on the distribution column. In particular, queries that have the following structure now run much faster in Citus 7.1:
SELECT l_orderkey, count(DISTINCT l_partkey) FROM lineitem GROUP BY l_orderkey ORDER BY 2 DESC LIMIT 10;
We ship new releases of Citus roughly every three months. This means every three months there are new awesome Citus features you may want to take advantage of. In Citus 7.1, we’ve added a feature that, unless you opt out, will automatically notify you of recent product updates, with messages like this in your database logs:
a new minor release of Citus (X.Y.Z) is available. This new 7.1 feature also sends anonymized database cluster information to our product team so they can keep making the product better. The anonymized data is extremely useful in understanding cluster sizes and usage patterns and scaling patterns, so that we can continue to prioritize the features you need the most in our gridding exercises.
The team at Citus is super excited to see Citus 7.1 out in the wild, helping developers like you to never have to worry about scaling again. (Or at least, to worry a heck of a lot less.) As always, Citus is available for open source downloads, as enterprise on-prem software, and via Citus Cloud, our fully-managed database as a service. If you give Citus a spin. we’d love your feedback. And if you have feature requests, we’re all ears. Feel free to join the conversation in our slack channel.