Citus 11.3 is out! Now with tenant statistics. Read all about it in Marco's 11.3 blog post. 💥
Learn how to scale out Postgres with Citus, from a single node to a large distributed cluster. Citus is a Postgres extension, not a fork, and is 100% open source.
The Citus database distributes your Postgres tables across multiple nodes and parallelizes your queries and transactions. The combination of parallelism, keeping more data in memory, and higher I/O bandwidth often leads to dramatic speed ups. In this chart, we show a benchmark SQL query running ~40x faster with an 8-node Citus cluster vs. a single Postgres node.
By introducing new Postgres table types—distributed tables and reference tables—Citus makes it possible to distribute your data and queries across multiple nodes. In addition to the new table types, the Citus architecture includes a distributed Postgres query planner and an adaptive query executor. Distributed transactions are also supported. As of Citus 10, you can shard on a single Citus node—and you use the Citus columnar feature to achieve compression ratios of 3x-10x or more. With the Citus 11 release, Citus is now 100% open source and you can query from any node.
Citus is an open source extension to Postgres (not a fork.) So when you use Citus, you’re still using Postgres under the covers, along with the Citus extension on top. To your application, running on a Citus distributed database is like running on top of a single Postgres node. And because Citus is an extension, it’s easy for us to keep Citus current with the latest Postgres releases—plus you get the performance benefits of horizontal scale, while still being able to leverage your familiar SQL toolset and your Postgres expertise.
Because Citus distributes your data, parallelizes your queries, keeps more data in memory, and gives you higher I/O bandwidth—Citus can meet the demanding performance requirements of mixed OLTP and OLAP workloads. So you can simplify your architecture by using a single database for your app’s transactional and analytical workloads, even for data-intensive applications. And with Citus 10, Citus gives you more capabilities: you can now use both columnar and row-based tables in your Citus distributed database.
Find out more about the Citus concepts, architecture, cluster management, APIs, use cases, & performance tuning.
See how Citus scales out Postgres and parallelizes your workloads via these YouTube videos. Tip: turn on captions.
Learn how to use Citus by using sample data in these short tutorials. For time series data, check out the use case guide.
You can download and install Citus open source packages for Docker, Ubuntu, Debian, Fedora, CentOS, and Red Hat via these simple steps.
You can stand up a Citus cluster in minutes with the Azure Cosmos DB for PostgreSQL managed service.
Using sharding and replication, the Citus extension distributes your data and queries across multiple nodes in a cluster, to give your app parallelism as well as more memory, compute, and disk. Citus is available as an open source download and in the cloud as a managed service. Azure Cosmos DB for PostgreSQL makes it easy to stand up a managed Citus cluster in minutes.
As of Citus 10, you can now shard Postgres on a single node, too. So you can adopt a distributed data model from the start to parallelize your queries—and be “scale-out ready.” Single-node Citus can also help to simplify your CI/CD pipelines.
Learn how Citus works in this talk about Citus table types, the PostgreSQL extension APIs, the Citus query planner, and performance benchmarks comparing multi-node Citus clusters to a single node.
Citus 11.3 blog post
Official release notes for Citus 11.3
Citus 11.2 blog post
Official release notes for Citus 11.2
Postgres 15 available in Azure Cosmos DB for PostgreSQL
Distributed PostgreSQL comes to Azure Cosmos DB
Citus 11.1 shards your Postgres tables without interruption
Official release notes for Citus 11.1
Distributed Postgres goes full open source with Citus: why, what, & how
Benchmarking performance of Citus and Postgres with HammerDB
Open Source News
Open sourcing the Citus shard rebalancer
How to scale Postgres for time series data
|Citus Version||Compatible with PostgreSQL|
|9.5||11, 12, 13|
|10.0.x||11, 12, 13|
|10.2.x||12, 13, 14|
|11.1.x, 11.2.x, 11.3.x||13, 14, 15|
Citus achieves order-of-magnitude faster execution compared to vanilla PostgreSQL through a combination of parallelism, keeping more data in memory, and higher I/O bandwidth.
Citus enables real-time interaction with large datasets that span billions of records—and is a good fit for customer-facing workloads that often require low-latency response times. Performance increases as you add nodes to a Citus database cluster. This 15-min performance demo from SIGMOD shows how Citus speeds up Postgres, using the HammerDB benchmark. And new to Citus 10, columnar storage can speed up analytics workloads that benefit from compression, too.
The first step in migrating an application from Postgres to Citus is to choose your distribution column (sometimes called a distribution key, or a sharding key.) You’ll want to understand your workload in order to pinpoint a “good” distribution column, e.g., a column that enables you to get the maximum performance from Citus.
The second step is to prepare the Postgres tables and SQL queries for migration. The amount of effort involved depends (you’ve heard that before, right?) on whether your application is already centered around that distribution column in terms of queries and schema. If not, you may have to update some of your queries and/or add the distribution column to some of your tables.
Alternately, as of Citus 10, you can now shard your database on a single node. So you can build your application on single-node Citus from the very start and be “scale-out-ready”, able to easily add nodes and rebalance your Citus cluster as your application grows.
If you are ready to delve deeper, the Migrating to Citus guide in the Citus documentation should be useful.
The Citus extension to Postgres is commonly used with customer-facing applications that are growing fast, have demanding performance requirements, are starting to experience slow queries, need to plan for future scale—or all of the above. Common use cases for Citus—both on-prem and in the cloud where Hyperscale (Citus) is an option in the Azure Database for PostgreSQL managed service—include:
As you’ll learn in the Citus concepts section of the documentation, Citus divides Postgres tables into multiple smaller tables, called shards. The shards are then spread across the nodes in the Citus database cluster when you configure Citus with the
create_distributed_table() function. When new data is ingested or when queries come in, the Citus coordinator routes them to the correct shards based on the value of the distribution column.
Another way of thinking about shards: Each shard contains a portion of the larger Postgres table that you have distributed. Imagine you previously had a 1 TB Postgres table. Now imagine you have distributed that 1 TB table across 100 shards in a Citus cluster. Each shard—which is just a smaller Postgres table—would be a 10 GB Postgres table.
Citus does more than simply shard and distribute your data, however. Citus also parallelizes your SQL queries across different nodes in the Citus cluster, giving you an order-of-magnitude increase in query response times for many use cases. New as of Citus 10, you can be “scale-out-ready” by sharding your Postgres database on a single Citus node, using a distributed data model from the very start—so you can easily add nodes later as your application grows.