CitusDB 2.1 release notes

Written by Sumedh Pathak
August 22, 2013

We are excited to announce CitusDB's v2.1 release, and provide our users with new features that enable them to address a larger variety of use-cases. Some of these new features include:

  • Task tracker based execution: Until this point, CitusDB only used a real-time executor to parallelize incoming queries. This worked great for requests that took between 50ms and 10 minutes, but was not suitable for long running queries. We are now happy to announce our new task tracker executor that enables us to easily handle long running queries and to scale to hundreds of machines. This change truly puts CitusDB at Hadoop-scale, and also allows us to handle real-time and more complex queries equally well.
  • Distinct approximations: Our users have long been asking for an efficient way to calculate count(distinct) approximations over distributed datasets. We now use the HyperLogLog algorithm to calculate approximate values for count distinct queries. If you're using our EC2 machine image, this feature already comes pre-configured. On other platforms, you will need to do the following:
    • Install the latest citusdb-contrib package (version 2.1) on every node.
    • Create the hll extension on every node.
    • Enable count distinct approximations by setting the countdistincterror_rate configuration value. Lower values for this configuration setting are expected to give more accurate results. We recommend setting this to 0.005.
  • Support for CREATE TABLE AS/SELECT ... INTO queries on distributed tables.
  • Support for CURSORs on distributed tables.
  • Based on the latest PostgreSQL 9.2.4 release, which includes a critical security fix.

Installation notes

CitusDB 2.1 is binary compatible with the CitusDB 2.0, and an upgrade of the package can be done without impacting previously created data directories. If you are downloading and using CitusDB for the first time, please follow the installation instructions found in the CitusDB documentation page.

Sumedh Pathak

Written by Sumedh Pathak

Former principal engineer on the Postgres team at Microsoft. Co-founder & VP of Engineering at Citus Data. Speaker at QCon London & DataEngConf SF. M.S. Computer Science Stanford. Family. Tennis ball. Dog.