POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Ozgun Erdogan
April 5, 2018
Today, we’re excited to announce Citus 7.3—the latest release of our distributed database that scales out Postgres. Citus 7.3 improves support for complex analytical queries, provides integration with Tableau and other BI tools, and integrates with the open source Postgres extension, TopN.
The features in this latest Citus database release are particularly important for real-time analytics workloads. In these workloads, users typically need to ingest data in real time and run analytical queries with sub-second response times. A good example is when you’re serving a dashboard to thousands of customers and your database needs to provide fast replies over billions of rows.
Here’s a quick overview of what’s new in Citus. For an overview of other recent Citus features, check out these blog entries about TopN for your Postgres database and Citus 7.2.
For all these new Citus features, you can try them today by downloading Citus open source on your local machine. Or you can spin up a new Citus database cluster in the cloud by using our fully-managed database as a service. Or keep reading to learn more about the new features in Citus.
Citus already provided full SQL support for queries that were scoped to a single machine, such as with many multi-tenant applications. We’ve also been adding and strengthening SQL support for complex queries that span across a cluster of machines. With version 7.3 of the Citus database, we now have even broader support for complex SQL queries. In particular, Citus 7.3 makes three improvements.
-- Subquery foo has a join on distribution keys. Subquery bar has a join on distribution keys
-- However, join among subqueries foo and bar isn’t on the distribution keys
SELECT
foo.value_2
FROM
(SELECT
users_table.value_2
FROM
users_table, events_table
WHERE
users_table.user_id = events_table.user_id AND event_type IN (1,2,3,4)) as foo,
(SELECT
users_table.value_2
FROM
users_table, events_table
WHERE
users_table.user_id = events_table.user_id AND event_type IN (5,6,7,8)) as bar
WHERE
foo.value_2 = bar.value_2;
Citus 7.3 improves support for running complex aggregate functions across a distributed cluster. In particular, Citus 7.3 can now distribute the following PostgreSQL aggregate functions.
json_agg
, json_object_agg
, jsonb_agg
, and jsonb_object_agg
are now supported. These functions were contributed to Citus by Markus Sintonen.bool_or
, bool_and
, bit_or
, and bit_and
are now supported over distributed clusters.Citus 7.3 adds support for renaming distributed tables and indexes. Renaming distributed tables becomes helpful in several scenarios. For example, you could change your Rails application, adjust the model name in your ORM, and you need the underlying table name to match the new model name. Or, you may need to change the distribution key on a table. To do that, you can create a new table, copy data to the new table, and rename the old table to the new one.
Citus 7.3 also broadens its support for index modification commands. You can now rename an index created on a distributed table, or set / reset storage parameters for an index.
ALTER TABLE ad_campaigns RENAME TO campaigns;
ALTER INDEX ad_campaigns_pkey RENAME TO campaigns_pkey;
ALTER INDEX campaigns_pkey SET (fillfactor = 40);
At Citus we believe that data is a valuable asset to any business, and extracting value and insights from that data are key to success. Tableau is a popular business intelligence and analytics tool for databases. Tableau already has a robust connector for PostgreSQL. Since Citus extends Postgres, you can leverage the vast ecosystem of tools and libraries that support Postgres.
Some of you already use Citus and Tableau together. With Citus 7.3 we’ve done the work to ensure Citus and Tableau provide a seamless experience when you need to perform ad-hoc reporting or analysis. You can now interact with Tableau using the following steps.
Citus 7.3 also comes with support for a new open source PostgreSQL extension called TopN. TopN returns the top values in a database according to some criteria. When doing this, TopN applies an approximation algorithm to provide fast results using few compute and memory resources.
We originally created TopN to help Citus customers, who needed to scale out their PostgreSQL databases. These customers needed to serve customer-facing dashboards and run analytical queries that needed sub-second responses.
The TopN extension achieves that by materializing top values in a database, incrementally updating these top values, and/or merging top values from different time ranges. If you’re familiar with the HyperLogLog (HLL) extension, you can think of TopN as its sister.
If you’re interested in learning more, please see the examples section in the TopN repository.
We only covered the highlights for Citus 7.3 in this blog post. We have many more changes in this release and you can read the full list in our Citus GitHub repo.
Citus 7.3 makes Citus more of a drop-in replacement for your single-node Postgres database. By scaling out Postgres using Citus, you don’t need to worry about adapting your database for a distributed system.
As always, Citus is available in three ways: as open source, as enterprise software, and in the cloud as a managed database service.
If you give Citus a try, we’d love to hear your feedback. Please join the conversation in our Slack channel and let us know what you think.