Citus 7.3: Broader SQL coverage, Tableau Integration, TopN extension, and more

Written by Ozgun Erdogan
April 5, 2018

Today, we’re excited to announce Citus 7.3—the latest release of our distributed database that scales out Postgres. Citus 7.3 improves support for complex analytical queries, provides integration with Tableau and other BI tools, and integrates with the open source Postgres extension, TopN.

The features in this latest Citus database release are particularly important for real-time analytics workloads. In these workloads, users typically need to ingest data in real time and run analytical queries with sub-second response times. A good example is when you’re serving a dashboard to thousands of customers and your database needs to provide fast replies over billions of rows.

Here’s a quick overview of what’s new in Citus. For an overview of other recent Citus features, check out these blog entries about TopN for your Postgres database and Citus 7.2.

What’s New in Citus 7.3

  • Complex subquery support
  • Broader support for aggregate functions (json(b), bool, and bit aggregates)
  • Renaming distributed tables and indexes
  • Tableau integration
  • TopN extension integration

For all these new Citus features, you can try them today by downloading Citus open source on your local machine. Or you can spin up a new Citus database cluster in the cloud by using our fully-managed database as a service. Or keep reading to learn more about the new features in Citus.

Complex subquery support

Citus already provided full SQL support for queries that were scoped to a single machine, such as with many multi-tenant applications. We’ve also been adding and strengthening SQL support for complex queries that span across a cluster of machines. With version 7.3 of the Citus database, we now have even broader support for complex SQL queries. In particular, Citus 7.3 makes three improvements.

  • Joins between two or more subqueries, where the join is between large tables and not on the distribution column.
  • Support for all types of subqueries that appear in a WHERE clause.
  • Support for data-modifying statements in WITH (CTE) clauses.
-- Subquery foo has a join on distribution keys. Subquery bar has a join on distribution keys
-- However, join among subqueries foo and bar isn’t on the distribution keys

    SELECT
        foo.value_2
    FROM
        (SELECT 
            users_table.value_2 
        FROM 
            users_table, events_table 
        WHERE 
            users_table.user_id = events_table.user_id AND event_type IN (1,2,3,4)) as foo,
        (SELECT 
            users_table.value_2 
        FROM 
            users_table, events_table 
        WHERE 
            users_table.user_id = events_table.user_id AND event_type IN (5,6,7,8)) as bar
    WHERE
        foo.value_2 = bar.value_2;

Broader support for complex aggregate functions across a Citus database cluster

Citus 7.3 improves support for running complex aggregate functions across a distributed cluster. In particular, Citus 7.3 can now distribute the following PostgreSQL aggregate functions.

  • json and jsonb aggregate functions: json_agg, json_object_agg, jsonb_agg, and jsonb_object_agg are now supported. These functions were contributed to Citus by Markus Sintonen.
  • bool and bit aggregate functions: bool_or, bool_and, bit_or, and bit_and are now supported over distributed clusters.

Renaming distributed tables and indexes

Citus 7.3 adds support for renaming distributed tables and indexes. Renaming distributed tables becomes helpful in several scenarios. For example, you could change your Rails application, adjust the model name in your ORM, and you need the underlying table name to match the new model name. Or, you may need to change the distribution key on a table. To do that, you can create a new table, copy data to the new table, and rename the old table to the new one.

Citus 7.3 also broadens its support for index modification commands. You can now rename an index created on a distributed table, or set / reset storage parameters for an index.

ALTER TABLE ad_campaigns RENAME TO campaigns;
ALTER INDEX ad_campaigns_pkey RENAME TO campaigns_pkey;
ALTER INDEX campaigns_pkey SET (fillfactor = 40);

Verified integration between Tableau & Citus

At Citus we believe that data is a valuable asset to any business, and extracting value and insights from that data are key to success. Tableau is a popular business intelligence and analytics tool for databases. Tableau already has a robust connector for PostgreSQL. Since Citus extends Postgres, you can leverage the vast ecosystem of tools and libraries that support Postgres.

Some of you already use Citus and Tableau together. With Citus 7.3 we’ve done the work to ensure Citus and Tableau provide a seamless experience when you need to perform ad-hoc reporting or analysis. You can now interact with Tableau using the following steps.

  • You can directly connect to Citus by choosing PostgreSQL from the connection menu.
  • Once you connect to Tableau, you will see the tables in your database. You can define your data source by dragging and dropping tables from the “Table” pane. Or, you can run a custom query through “New Custom SQL”.
  • You can create your own sheets by dragging and dropping dimensions, measures, and filters. You can also create an interactive user interface with Tableau. To do this, Tableau automatically chooses a date range over the data. Citus can then compute aggregations over this range in human real-time.

Tableau report by state

TopN, an open source extension to PostgreSQL

Citus 7.3 also comes with support for a new open source PostgreSQL extension called TopN. TopN returns the top values in a database according to some criteria. When doing this, TopN applies an approximation algorithm to provide fast results using few compute and memory resources.

We originally created TopN to help Citus customers, who needed to scale out their PostgreSQL databases. These customers needed to serve customer-facing dashboards and run analytical queries that needed sub-second responses.

The TopN extension achieves that by materializing top values in a database, incrementally updating these top values, and/or merging top values from different time ranges. If you’re familiar with the HyperLogLog (HLL) extension, you can think of TopN as its sister.

If you’re interested in learning more, please see the examples section in the TopN repository.

Full list of changes in Citus 7.3

We only covered the highlights for Citus 7.3 in this blog post. We have many more changes in this release and you can read the full list in our Citus GitHub repo.

Share your thoughts on Citus 7.3 with us

Citus 7.3 makes Citus more of a drop-in replacement for your single-node Postgres database. By scaling out Postgres using Citus, you don’t need to worry about adapting your database for a distributed system.

As always, Citus is available in three ways: as open source, as enterprise software, and in the cloud as a managed database service.

If you give Citus a try, we’d love to hear your feedback. Please join the conversation in our Slack channel and let us know what you think.

Ozgun Erdogan

Written by Ozgun Erdogan

Co-founder & CTO of Citus Data. Former Postgres engineering team director at Microsoft. Worked on distributed systems at Amazon. Speaker at PGCon, XLDB Conf, DataEngConf, PostgresOpen, & QCon. Dad.