CitusDB 4.0, pg_shard 1.1, and cstore 1.2 are out. What's next?

Written by Ozgun Erdogan
April 27, 2015

We’re excited to release CitusDB 4.0, pg_shard 1.1, and cstore 1.2! These products extend PostgreSQL for scaling out and high performance.

Now that our new releases are out, we wanted to answer two questions that we continuously hear from our users. First, where does each product fit in within the PostgreSQL ecosystem? Second, what’s the roadmap for each product?

CitusDB

CitusDB scales out PostgreSQL for real-time analytics. Example use-cases include powering real-time dashboards, analyzing clickstream data, and visualizing geospatial queries on maps.

CitusDB 4.0 makes notable usability and performance improvements, offers all new PostgreSQL 9.4 goodness, and comes with a shard rebalancer for dynamic scaling and fault tolerance.

Next three months: We’re focusing on usability, usability, and usability. We made many improvements to CitusDB over the past two years, but we for example never got around to updating our documentation. We’re now completely revamping our documentation to make these features accessible. We’re also working on features that will make CitusDB easier to deploy and use.

Next year: SQL is huge. We already have good SQL coverage, and we’re at a place where parallelizing new SQL features takes a few hundred lines of code. We will look to improve our SQL coverage.

Try CitusDB v4.0 for yourself by downloading it here!

pg_shard

pg_shard is an open source sharding extension for PostgreSQL. As a standalone extension, it addresses many NoSQL use-cases that involve short reads and writes. pg_shard 1.1 improves INSERT performance by up to 4x, provides more seamless integration with CitusDB, and adds shard repair functionality.

We find that many of our users want to use pg_shard and CitusDB together. pg_shard enables real-time data ingest at scale. CitusDB provides bulk loading and supports advanced analytic features such as distributed joins.

Next three months: We’re making new improvements to lower the barrier to entry for PostgreSQL users who are looking to scale out. We also heard from users that they’d love to see query parallelization for simple analytical queries. We plan to have parallel aggregates and group bys in our next release.

Next year: pg_shard currently keeps its metadata in one machine. (We have enthusiastic users who manually replicate that metadata.) We intend to make the metadata fully distributed as well; that way, users will be able to write to any node.

We currently have two competing designs in review. One offers eventual consistency semantics similar to NoSQL databases, and takes shorter to implement. The other offers better distributed consistency and isolation guarantees, and provides behavior similar to SAP HANA. We will publish these two designs and ask for feedback on them on Hacker News.

Try pg_shard 1.1 for yourself by downloading it here!

cstore_fdw

cstore_fdw is an open source columnar storage engine for PostgreSQL, and as a consequence for CitusDB. One benefit of cstore is that it compresses the underlying data on average by 4x. Also, users who have database tables of 50+ columns see notable I/O benefits thanks to the columnar layout. These two benefits enable cstore_fdw 1.2 to improve query performance for data warehousing workloads.

Next three months: We will improve on cstore_fdw’s statistics collection and cost estimates to provide consistent and fast query performance. We will also have Alter Table … Add/Remove Column support in our next release. Finally, we’re investigating the best way to provide back-ups for columnar tables.

Next year: This is an area where we’re looking for input from cstore_fdw users. One big feature in consideration is a vectorized executor for cstore. We already have a prototype for this idea, and saw performance improvements of 3-4x for in-memory SELECT queries. Still, we know that writing a new PostgreSQL executor is a notable commitment, and we’re looking to hear more from our users here.

Try cstore_fdw 1.2 for yourself by downloading it here!

In summary, we’re focusing on usability, seamless integration, and full metadata distribution for the next year. We envision a world where you can run your local or distributed SQL just the same, and simply get all of scaling’s benefits in the latter case!

Ozgun Erdogan

Written by Ozgun Erdogan

Leads the Postgres engineering team at Microsoft. Co-founder & CTO of Citus Data. Worked on distributed systems at Amazon. Speaker at PGCon, XLDB Conf, DataEngConf, PostgresOpen, & QCon. Dad.