Note: We are currently updating our documentation to include all relevant information on new CitusDB features. Please send us an email at engage @ citusdata.com if you can't find documentation for a feature.
Citus DB is a scalable analytics database that is built on top of PostgreSQL. Designed with parallelism in mind, it is the first such database that enables running distributed SQL queries on data that's external to the database.
Citus DB gives flexible and fast access to large volumes of data. Datasets that have a natural temporal ordering are particularly applicable; and these include user actions, event streams, log files, and machine generated data. Citus DB partitions these data and the queries that operate on the data; and efficiently executes queries that involve look-ups, complex selections, groupings and orderings, and analytic functions. Further, Citus DB supports joins between between any number of tables, independent of their size and partitioning method.
Citus DB enables real-time responsiveness. Query run times start at 100ms for simple queries, and increase depending on query complexity and the dataset size. Citus DB users can also easily use the pg_shard extension to enable real-time writes into their distributed tables.
At a high level, Citus DB distributes the data across a cluster of nodes, and then processes incoming analytic queries in parallel across these nodes. Citus DB achieves this by making three particular changes to the underlying database:
These three changes closely align Citus DB's architecture with that of Hadoop's. In fact, they allow us to combine the scalability and fault tolerance of Hadoop with the performance and SQL-compliance of databases. Users can easily scale out a cluster as more data are added, take advantage of standard data visualization tools, and also benefit from performance improvements that databases typically employ (indexes, join order optimizations, etc).