Citus extends PostgreSQL to support distributed SQL queries. On top of PostgreSQL, Citus comes with its own transparent sharding, replication, distributed query planner and executor logic which enable execution of distributed SQL queries in parallel. This provides Hadoop-like fault tolerance, scalability and recovery from mid-query failures while allowing large datasets to be queried orders of magnitude faster than what has been possible on PostgreSQL before.
Answers to Frequently Asked Questions
Citus 5.0 is compatible with PostgreSQL 9.5 and 9.4. Citus does not fork PostgreSQL and we maintain its compatibility with the latest major PostgreSQL release.
Since Citus provides distributed functionality by extending PostgreSQL, it uses the standard PostgreSQL SQL constructs. This includes the support for wide range of data types (including semi-structured data types like jsonb, hstore), full text search, operators and functions, foreign data wrappers, etc.
PostgreSQL has a wide SQL coverage; and Citus does not support that entire spectrum out of the box when querying distributed tables. Some constructs which aren't supported natively for distributed tables are:
- Window Functions
- Set operations
- Transactional semantics for queries that span across multiple shards
It is important to note that you can still run all of those queries on regular PostgreSQL tables in the Citus cluster. As a result, you can address many use cases through a combination of rewriting the queries and/or adding some extensions. We are working on increasing the distributed SQL coverage for Citus to further simplify these queries. So, if you run into an unsupported construct, please contact us and we will do our best to help you out.
Since Citus is based on PostgreSQL, you can directly use PostgreSQL extensions such as HyperLogLog or PostGIS with it. When using other extensions, you will first need to create the Citus extension on your PostgreSQL instance and then the other extensions you want to use. Citus will work with tools that use standard PostgreSQL drivers such as Tableau through regular ODBC/JDBC drivers. In general, you can use standard PostgreSQL drivers and language bindings with Citus, which means almost any language is supported. You can view a list of supported drivers and interfaces for PostgreSQL here.
You can find a subset of our customers listed in our homepage and examples of what CItus is used for under our published case studies. Stay tuned as we work on sharing more examples of Citus deployments publicly! Currently Citus is used by many companies ranging from Silicon Valley startups to publicly listed corporations and in sectors including web & mobile analytics, digital marketing, web infrastructure, security, advertising technology, retail, digital media, etc... Our customers process billions of events in real-time on their Citus clusters while continuing to use the PostgreSQL features and extensions they're familiar with.
Citus deployments continue scaling up horizontally as we speak. On the last count, we had customers keeping hundreds of TBs on Citus, using tens of nodes in parallel and ingesting TBs of data per day. We test Citus on 100+ nodes and Citus is capable of keeping and processing PB scale workloads so we look forward to the continuing growth of our customers' Citus deployments.
Citus does not fork PostgreSQL, it extends it. Therefore, you have access to the features, tools and extensions that come with the latest version of PostgreSQL. Citus is optimized for real-time analytics workloads and brings together fast parallel query execution with real-time data ingestion and high concurrency capabilities. You can deploy Citus on the cloud or on premise depending on your preferences.
Citus is not a columnar database by design since it extends PostgreSQL. However, it can be used in combination with the cstore_fdw extension, which gives Citus the capability to create distributed columnar tables. This helps to reduce the data footprint and improves the performance for disk-bound workloads.
Citus effectively parallelizes queries and achieves orders of magnitude faster execution compared to vanilla PostgreSQL through simultaneous utilization of multiple cores available in your cluster of servers. Citus enables human real-time interaction (seconds) with large datasets that span billions of records. Watch our demo to see how Citus speeds up PostgreSQL.
A single Citus node stores multiple shards of the same distributed table. This enables Citus to use multiple cores for a single query by virtue of hitting multiple PostgreSQL tables (shards) on each node. However, to get true scalability in performance and reliability, we recommend a multi-node cluster. In cases where queries hit the disk, a single node setup can easily become disk I/O bound.
You can deploy Citus on premise or on the cloud. Go here to try it out.
Citus replicates data for fault tolerance on the worker nodes, see our documentation for replication details and failure semantics. The Citus master node contains only metadata, and we recommend using a standard PostgreSQL backup / replication tool to provide high availability and reliability, such as streaming replication.
The number of nodes needed depends on the use-case and performance requirements. Citus architecture scales out processing power, memory and storage linearly, and you can read more about its performance characteristics here.
Citus provides transparent sharding at the database layer, thus allowing users to keep their applications unchanged. See here for more information on Citus architecture and sharding semantics.
Optimal shard count is related to the total number of cores on the workers. Citus partitions an incoming query into its fragment queries which run on individual worker shards. Hence, the degree of parallelism for each query is governed by the number of shards the query hits. To ensure maximum parallelism, you should create enough shards on each node such that there is at least one shard per CPU core.
Moving from PostgreSQL to Citus requires minimal changes at the application layer. Since Citus is deployed as a PostgreSQL extension, PostgreSQL users can often start using Citus by simply installing the extension on their existing database. Once the extension is created, you can create and use distributed tables through standard PostgreSQL interfaces while maintaining compatibility with existing PostgreSQL tools. If you are moving from MySQL or any other relational database, the migration path is similar to moving to PostgreSQL from another relational database. We've had numerous customers move from MySQL to Citus with little change in their applications.
Citus treats cstore_fdw tables just like regular PostgreSQL tables. When cstore_fdw is used with Citus, each logical shard is created as a foreign cstore_fdw table instead of a regular PostgreSQL table. If your cstore_fdw use case is suitable for the distributed nature of Citus (e.g. large dataset archival and reporting), the two can be used to provide a powerful tool which combines query parallelization, seamless sharding and HA benefits of Citus with superior compression and I/O utilization of cstore_fdw.
With the release of newly open sourced Citus v5.0, pg_shard's codebase has been merged into Citus to offer you a unified solution which provides the advanced distributed query planning previously only enjoyed by CitusDB customers while preserving the simple and transparent sharding and real-time writes and reads pg_shard brought to the PostgreSQL ecosystem. Our flagship product, Citus, provides a superset of the functionality of pg_shard and we have migration steps to help existing users to perform a drop-in replacement. Please contact us for more information.
The Citus server is licensed under the GNU Affero General Public License v3.0. For additional details, including answers to common questions about the AGPL, see the FAQ from the Free Software Foundation. The client drivers are licensed under the PostgreSQL license.
With this licensing structure, we looked to accomplish the following objectives:
- Allow users to download Citus, see the source code, and use it for free.
- Require users who choose to modify Citus to fit their needs, to release the patches to the software development community.
- Require users who are unwilling to release the patches to the software development community to purchase a commercial license.
With a significant volume of database software delivered today as a hosted service vs. distributed in binary form, GNU AGPL became the most effective license to fulfill all of the above.
Having the client drivers under the PostgreSQL license removes any ambiguity as to the extent of the server license. We also have the Citus Enterprise product available under a commercial license from Citus Data.