Why Citus?

  • Parallelized
    Performance

    The Citus database distributes your Postgres tables or schemas across multiple nodes and parallelizes your queries and transactions. The combination of parallelism, keeping more data in memory, and higher I/O bandwidth often leads to dramatic speed ups. In this chart, we show a benchmark SQL query running ~40x faster with an 8-node Citus cluster vs. a single Postgres node.

    Performance bar chart comparing a Citus 8-node cluster to a single Postgres node
    Time to complete a SQL query, using a benchmark that measures analytical query performance. Run on ~100 GB of GitHub archive data in JSON format. All servers are Azure VMs with 16 vCPUs, 64 GB of memory, and network-attached disks with 7500 IOPS—running Postgres 13 and Citus 9.5, with default settings.
  • Distributed
    Scale

    Citus makes it possible to distribute your data, queries, and transactions across multiple nodes—by row or by schema. In addition, the architecture includes a distributed query planner and an adaptive query executor. You can shard on a single node, query from any node, and you can use the columnar feature to achieve compression ratios of 3x-10x or more. Citus is a 100% open source Postgres extension.

    Diagram of a Citus distributed database cluster
    When querying a Citus cluster, most applications query the Citus coordinator as shown here. You can also query from any node (excluding schema changes, those still go to the coordinator.) The Citus node you connect to will transform the queries & route them to the correct shards.
  • Power of
    Postgres

    Citus is an open source extension to Postgres (not a fork.) So when you use Citus, you’re still using Postgres under the covers, along with the Citus extension on top. To your application, running on a Citus distributed database is like running on top of a single Postgres node. And because Citus is an extension, it’s easy for us to keep Citus current with the latest Postgres releases—plus you get the performance benefits of horizontal scale, while still being able to leverage your familiar SQL toolset and your Postgres expertise.

    PostgreSQL elephant icon, known as the Slonik Logo
    Postgres, PostgreSQL, and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.
  • Simplified
    Architecture

    Because Citus distributes your data, parallelizes your queries, keeps more data in memory, and gives you higher I/O bandwidth—Citus can meet the demanding performance requirements of mixed OLTP and OLAP workloads. So you can simplify your architecture by using a single database for your app’s transactional and analytical workloads, even for data-intensive applications. Citus gives you more capabilities: you can now use both columnar and row-based tables in your Citus distributed database. And with Citus 12, you can now easily support microservices.

    Architecture diagram of a mixed transactional and analytical workload
    Diagram of a common use case for Citus: data-intensive applications that serve mixed transactional and analytical workloads. The transactional workload requires the robustness of a relational database like Postgres. And since the analytics dashboards are often customer-facing, they typically require low-latency query response times.

Learn Your Way: Read, Watch, or Do

docs icon

Read the docs

Find out more about the Citus concepts, architecture, cluster management, APIs, use cases, & performance tuning.

videos icon

Watch the videos

See how Citus scales out Postgres and parallelizes your workloads via these YouTube videos. Tip: turn on captions.

tutorials icon

Try the tutorials

Learn how to use Citus by using sample data in these short tutorials. For time series data, check out the use case guide.

Try Citus Right Now

Citus elicorn icon

Citus Open Source

You can download and install Citus open source packages for Docker, Ubuntu, Debian, Fedora, CentOS, and Red Hat via these simple steps.

cloud icon

Citus on Azure

You can stand up a Citus cluster in minutes with the Azure Cosmos DB for PostgreSQL managed service.

Sharding on a Single Node or Multiple Nodes

Using sharding and replication, the Citus extension distributes your data and queries across multiple nodes in a cluster, to give your app parallelism as well as more memory, compute, and disk. Citus is available as an open source download and in the cloud as a managed service. Azure Cosmos DB for PostgreSQL makes it easy to stand up a managed Citus cluster in minutes.

As of Citus 10, you can now shard Postgres on a single node, too. So you can adopt a distributed data model from the start to parallelize your queries—and be “scale-out ready.” Single-node Citus can also help to simplify your CI/CD pipelines.

Citus single node to distributed Citus cluster diagram
A Citus database cluster contains a Citus coordinator node and multiple worker nodes. Each node contains small Postgres tables called shards. Learn more in the animated Citus architecture graphic—or in the Citus GitHub repo.

How Citus Works

Learn how Citus works in this talk about Citus table types, the PostgreSQL extension APIs, the Citus query planner, and performance benchmarks comparing multi-node Citus clusters to a single node.

Video thumbnail: screen with Citus performance benchmarks

Frequently Asked Questions

  1. Citus Version Compatible with PostgreSQL
    5.2 9.5 only
    6.x 9.5, 9.6
    7.x 9.6, 10
    8.x 10, 11
    9.0-9.4 11, 12
    9.5 11, 12, 13
    10.0.x 11, 12, 13
    10.1.x 12, 13
    10.2.x 12, 13, 14
    11.0.x 13, 14
    11.1.x, 11.2.x, 11.3.x 13, 14, 15
    12.0 14, 15
    12.1 14, 15, 16
  2. Citus achieves order-of-magnitude faster execution compared to vanilla PostgreSQL through a combination of parallelism, keeping more data in memory, and higher I/O bandwidth.

    Citus enables real-time interaction with large datasets that span billions of records—and is a good fit for customer-facing workloads that often require low-latency response times. Performance increases as you add nodes to a Citus database cluster. This 15-min performance demo from SIGMOD shows how Citus speeds up Postgres, using the HammerDB benchmark. Recently GigaOm published a benchmark performance report for Citus. Find out why benchmarking databases is so hard in this blog post by the lead engineer for Citus. Columnar storage can speed up analytics workloads that benefit from compression, too.

  3. The easiest way to start is by utilizing schema-based sharding, which assumes assigning each tenant to a separate schema. Citus then automatically distributes these among the nodes in your cluster and routes queries accordingly. The only change you will need to do in your application is to SET search_path when switching tenants. In some cases like with microservices, even that change may not be necessary if every microservice uses a separate user matching their schema name.

    If you want the best performance, row-based sharding, using a distribution column is the best approach. The first step in migrating an application from Postgres to Citus is to choose your distribution column (sometimes called a distribution key, or a sharding key.) You’ll want to understand your workload in order to pinpoint a “good” distribution column, e.g., a column that enables you to get the maximum performance from Citus.

    The second step is to prepare the Postgres tables and SQL queries for migration. The amount of effort involved depends (you’ve heard that before, right?) on whether your application is already centered around that distribution column in terms of queries and schema. If not, you may have to update some of your queries and/or add the distribution column to some of your tables.

    Alternately, you can now shard your database on a single node. So you can build your application on single-node Citus from the very start and be “scale-out-ready”, able to easily add nodes and rebalance your Citus cluster as your application grows.

    If you are ready to delve deeper, the Migrating to Citus guide in the Citus documentation should be useful.

  4. The Citus extension to Postgres is commonly used with customer-facing applications that are growing fast, have demanding performance requirements, are starting to experience slow queries, need to plan for future scale—or all of the above. Common use cases for Citus—both self-hosted and in the cloud as the Azure Cosmos DB for PostgreSQL managed service—include:

  5. As you’ll learn in the Citus concepts section of the documentation there are two ways of sharding Citus—row-based and schema-based. In both sharding methods Citus divides Postgres tables into multiple smaller tables, called shards. The shards are then spread across the nodes in the Citus database cluster.

    In the case of row-based sharding you decide how tables are split using the create_distributed_table() function.

    SELECT create_distributed_table(
      'table_name',
      'distribution_column');

    With schema-based sharding, the schema name acts as the grouping and you determine which schemas are distributed with citus_schema_distribute("name") function.

    SELECT citus_schema_distribute('name');

    When new data is ingested or when queries come in, the Citus coordinator routes them to the correct shards based on the value of the distribution column (row-based) or the schema name (schema-based) depending on which sharding method you chose.