Frequently Asked Questions about Citus

  • In January 2019, Microsoft acquired Citus Data. For more details on the exciting news, please visit the announcements on the Official Microsoft Blog and the Citus Data Blog.

  • Citus extends PostgreSQL to support distributed SQL queries. On top of PostgreSQL, Citus comes with its own transparent sharding, replication, distributed query planner, and executor logic. Together, these features enable you to scale analytical workloads by parallelizing queries, and to scale transactional workloads by routing transactions across the cluster.

    In addition, Citus also includes support for columnar storage, plus the ability to use both row and columnar tables in the same database. Citus Columnar can be used with local PostgreSQL tables on a single Citus node, as well as with Citus distributed tables across a multi-node cluster.

  • Citus Version Compatible with PostgreSQL
    5.2 9.5 only
    6.x 9.5, 9.6
    7.x 9.6, 10
    8.x 10, 11
    9.0-9.4 11, 12
    9.5 11, 12, 13
    10.0.x 11, 12, 13
    10.1.x 12, 13
    10.2.x 12, 13, 14
    11.0.x 13, 14
    11.1.x, 11.2.x, 11.3.x 13, 14, 15
    12.0 14, 15
    12.1 14, 15, 16
  • Since Citus provides distributed functionality by extending PostgreSQL, it uses the standard PostgreSQL SQL constructs. Citus provides full SQL support for queries which access a single node in the database cluster. These queries are common, for instance, in multi-tenant SaaS applications where different nodes store different tenants (see When to Use Citus). When a query spans across shards, there are a few limitations which you can typically work around using other PostgreSQL functionality.

  • Since Citus is based on PostgreSQL, you can directly use PostgreSQL extensions such as HyperLogLog, TopN, or PostGIS with it. When using Citus with other Postgres extensions, you will first need to create the Citus extension on your PostgreSQL instance and then create the other extensions you want to use. Citus will work with tools that use standard PostgreSQL drivers such as Tableau through regular ODBC/JDBC drivers.

    In general, you can use standard PostgreSQL drivers and language bindings with Citus, which means almost any language is supported. You can view a list of supported drivers and interfaces for PostgreSQL here.

  • You can find real-world examples of how organizations use Citus to scale out Postgres in our customer stories. Our customers are wonderful, and we appreciate their vote of confidence and the time they spent being interviewed for these case studies. You’ll find stories from companies who use Citus to build real-time analytics APIs and dashboards; as well as stories about teams that use Citus to scale their multi-tenant SaaS applications.

    The Citus distributed database is used by Fortune 100 companies and startups alike, across different types of businesses including web & mobile analytics, information & network security, advertising technology, sales & marketing automation, and fintech.

    And Citus is now available in the cloud on Microsoft Azure, as the Azure Cosmos DB for PostgreSQL managed service. Learn more about how the Helsinki Regional Transportation Authority uses Citus on Azure-along with PostGIS-to deliver impressive performance and reduce costs by 50%. Or watch this video interview with BNY Mellon about how the bank has made their Postgres queries 24X faster by scaling out Postgres with Citus on Azure (now known as Azure Cosmos DB for PostgreSQL).

  • Citus databases continue scaling out horizontally as we speak. On the last count, one Citus user had a 100-node production Citus cluster with over 50 TB of memory and 700 TB of data (1.4 PB uncompressed.) Another user manages 1.6 PB of time series data in Citus. The Microsoft Windows team manages their ship-room decisions by using Citus to scale out Postgres across 54 nodes and a total of 3,456 cores, 27 TB of memory, and 1.6 PB of SSD storage.

    That said, there are plenty of Citus users who benefit significantly from the parallelism of Citus—and our ability to serve mixed transactional and analytic workloads—at a much smaller scale, with smaller 2-node clusters. And as of Citus 10, you can now shard Postgres on a single node, adopting a distributed data model from the start.

  • There are several ways in which Citus is different than other analytics databases.

    • Citus is built for fast analytics and high transaction rates for many concurrent users. This is unlike most analytics databases which are generally not intended to support concurrent users or transactions.
    • Citus is open source. This is not the case for proprietary analytics databases. Open source means you have a lot of freedom and flexibility: you can run Citus and Postgres on your laptop, you can run it on VMs in the cloud. And you can take advantage of a large, vibrant ecosystem of tools and client libraries.
    • Because Citus is implemented as an extension to PostgreSQL (not a fork), it’s easy for us to keep Citus current with the latest releases of Postgres.
    • Citus supports distributed transactions, so you can easily transform your data in parallel inside your database, simplifying your infrastructure and enabling you to build fast and powerful analytics pipelines.
    • Citus also offers columnar storage and compression, allowing you to mix columnar and regular row tables in the same database. So with Citus 10, for data that lends itself well to columnar storage, you can take advantage of columnar benefits like reduced disk footprint, faster query performance, and lower costs.
  • Citus includes a columnar storage feature. Citus Columnar can give you compression ratios of 3x-10x or more—in addition to reducing your I/O bandwidth by skipping unneeded columns, which can be a big performance benefit on large tables. You can use columnar storage with local tables on a single Citus node, as well as with Citus distributed tables that have been sharded across a cluster. Check out our demo of columnar compression in Citus to learn more.

    When using Citus, you can choose between row-based and columnar tables depending on your needs. You can even create partitioned tables that have a mixture of row-based and columnar partitions to enable updates and fast lookups on recent data while compressing older data.

    Many years ago, our team created the cstore_fdw for columnar storage. The cstore_fdw open source extension implemented columnar storage by leveraging the foreign data wrapper (fdw) API. In Postgres 12, “table access methods” were introduced and have enabled us to create Citus Columnar, a next-generation columnar feature that is more tightly integrated with native Postgres features, including range partitioning.

  • Citus achieves order-of-magnitude faster execution compared to vanilla PostgreSQL through a combination of parallelism, keeping more data in memory, higher I/O bandwidth, and a simultaneous utilization of multiple cores available in your Citus database cluster.

    Citus enables real-time interaction with large datasets that span billions of records—and is a good fit for customer-facing workloads that often require low-latency response times. Performance increases as you add nodes to a Citus database cluster. This 15-min performance demo from SIGMOD shows how Citus speeds up Postgres, using the HammerDB benchmark. Recently GigaOm published a benchmark performance report for Citus. Find out why benchmarking databases is so hard in this blog post by the lead engineer for Citus. Columnar storage can speed up analytics workloads that benefit from compression, too.

  • Yes. Citus is available both as open source and in the cloud as the Azure Cosmos DB for PostgreSQL managed service. More details on how to get started with Citus can be found here.

  • Patroni is one of the most popular high availability (HA) solutions amongst Postgres open source users. As of Citus 11.2 and Patroni 3.0 there is now an integration between Citus and Patroni that enables fully declarative clustering with high availability and automatic failover.

  • Citus implements transparent sharding at the database layer—so if you use Citus, you do not need to manually shard your application, and you do not need to re-architect your application in order to scale out. You can read more about the Citus architecture and sharding semantics in our documentation.

  • Optimal shard count is related to the total number of cores on the workers. Citus partitions an incoming query into its fragment queries which run on individual worker shards. Hence, the degree of parallelism for each query is governed by the number of shards the query hits. To ensure maximum parallelism, you should create enough shards on each node such that there is at least one shard per CPU core.

  • The easiest way to start is by utilizing schema-based sharding, which assumes assigning each tenant to a separate schema. Citus then automatically distributes the schemas among the nodes in your cluster and routes queries accordingly. The only change you will need to do in your application is to SET search_path when switching tenants. In some cases like with microservices, even that change may not be necessary if every microservice uses a separate user matching their schema name.

    If you want the best performance, row-based sharding using a distribution column is the best approach, but that sometimes requires adjusting the schema and queries for optimal performance.

    Since Citus is deployed as a Postgres extension, Postgres users can often start using Citus by simply installing the extension on their existing database. Once the extension is created, you can create and use distributed tables through standard Postgres interfaces while maintaining compatibility with existing Postgres tools. For more information, see our Migrating to Citus guide.

  • Citus is open source and is available free for download here. Citus is also available in the cloud as the Azure Cosmos DB for PostgreSQL managed service. You can visit the Azure Cosmos DB for PostgreSQL pricing page to learn more about Citus on Azure pricing.

  • The Citus server is licensed under the GNU Affero General Public License v3.0. For additional details, including answers to common questions about the AGPL, see the FAQ from the Free Software Foundation. The client drivers are licensed under the PostgreSQL license.

    With this licensing structure, we looked to accomplish the following objectives:

    • Allow users to download Citus, see the source code, and use it for free.
    • Require users who choose to modify Citus to fit their needs, to release the patches to the software development community.
    • Require users who are unwilling to release the patches to the software development community to purchase a commercial license.

    With a significant volume of database software delivered today as a hosted service vs. distributed in binary form, GNU AGPL became the most effective license to fulfill all of the above.

    Having the client drivers under the PostgreSQL license removes any ambiguity as to the extent of the server license.