CitusDB 4.0 release notes

We're excited to announce general availability of CitusDB 4.0, which is based on the new PostgreSQL 9.4 release. CitusDB is our full featured, turnkey database solution for fully scalable, highly-available PostgreSQL. A licensed solution, CitusDB allows you to scale out PostgreSQL across commodity servers and also parallel process queries across the cluster for real-time analytics on big data.

This release brings exciting new functionality and performance improvements, including:

  • Support for real-time workloads:
    • CitusDB enterprise now integrates with pg_shard to support real-time workloads. This integration brings together scalable analytics and low-latency writes in one system.
  • Integration with PostgreSQL 9.4 to bring features like:
    • jsonb, a faster, more efficient data type for storing JSON data.
    • Faster and smaller GIN indexes.
    • Many more! See here for a full list of the new PostgreSQL 9.4 features.
  • Re-balancing the cluster for incremental scalability and fault tolerance:
    • Incrementally add nodes, and uniformly distribute data and thus traffic to those nodes.
    • Re-replicate data from failed nodes evenly across all the remaining ones.
  • Faster query performance:
    • A new task-assignment policy for better in-memory workload performance.
    • Binary serialization for data copied between nodes, for faster performance on queries fetching a lot of data.
    • Batching task-assignment calls for improved performance with re-partition joins.
    • Fixed a bug which was causing more data than necessary to be copied over the network.
  • Improved usability:
    • Modified \STAGE to allow loading data from any node in the cluster, making data-loads much easier, and also allowing for a more uniform data placement.
    • Query throttling in the real-time executor to prevent resource exhaustion when queries touch thousands of shards.

Installation notes

CitusDB 4.0 is based on PostgreSQL 9.4.0, and is binary incompatible with CitusDB 3.0. As a result, the install path for the new version has changed from /opt/citusdb/3.0 to /opt/citusdb/4.0 in order to allow for multiple versions of Citus DB to be installed in parallel.

If you are running CitusDB 3.0 and want to upgrade to v4.0, you will need to run pg_upgrade on your all nodes, and manually reload your distributed metadata. For specifics, please get in touch with us so we can assist with the upgrade process. If you are downloading and using CitusDB for the first time, please follow the installation instructions found in the CitusDB documentation page.

Download it now at citusdata.com/downloads.

PGConf.Russia talk on pg_shard

Last month we went to PGConf.Russia and gave a talk on pg_shard which is available for download on GitHub. pg_shard is an open source PostgreSQL extension which allows users to scale out their database by creating a cluster of commodity servers. pg_shard shards the database and replicates the shards across the cluster, allowing users to both easily scale out as well as create a high availability cluster. The video from my talk is now available for all to see:

We got some very interesting questions during the talk that we wanted to highlight and clarify.

  • Does pg_shard/CitusDB run my queries in parallel?
    In pg_shard, a query will use one thread on a master node and one on a worker node. You can run many queries in parallel by making multiple connections to the master node(s), whereas the real work is being done by the worker nodes. UPDATE and DELETE queries on the same shard are serialized to ensure consistency between the replicas. To parallelize multi-shard SELECT queries across the worker nodes, you can upgrade to CitusDB.
  • Can I use stored procedures in my queries?
    Yes, and this is a powerful feature of pg_shard. Function calls in queries are executed on the worker nodes, which allows you to include arbitrary logic in your queries and scale it out to many worker nodes.
  • How do I ALTER a distributed table?
    pg_shard currently does not automatically propagate ALTER TABLE commands on a distributed table to the individual shards on the workers, but you can easily do this with a simple shell script. For pg_shard: alter-pgshard-table.sh and for CitusDB: alter-citusdb-table.sh.
  • What kind of lock is used when copying shard placements?
    In the latest version of pg_shard, we've added a master_copy_shard_placement function which takes an exclusive lock of the shard. This will temporarily block changes to the shard, while selects can still go through.
  • What's the difference between pg_shard and PL/Proxy?
    Pl/Proxy allows you to scale out stored procedures on a master node across many worker nodes, whereas pg_shard allows you to (transparently) scale out a table and queries on the table across many worker nodes using replication and sharding.
  • Can I use cstore_fdw and pg_shard without CitusDB?
    You certainly can! cstore_fdw and pg_shard can be used both in a regular PostgreSQL database or in combination with CitusDB.

We would like to thank the organizers for a great conference and providing the recording of the talk! If you would like more information on pg_shard or on common use cases for scaling out PostgreSQL, please visit the Citus Data website.

Page 1 of 13

About

CitusDB is a scalable analytics database that's built on top of PostgreSQL.

In this blog, we share our ideas and experiences on databases and distributed systems.