CitusDB 4.0, pg_shard 1.1, and cstore 1.2 are out. What's next?

We're excited to release CitusDB 4.0, pg_shard 1.1, and cstore 1.2! These products extend PostgreSQL for scaling out and high performance.

Now that our new releases are out, we wanted to answer two questions that we continuously hear from our users. First, where does each product fit in within the PostgreSQL ecosystem? Second, what's the roadmap for each product?

CitusDB

CitusDB scales out PostgreSQL for real-time analytics. Example use-cases include powering real-time dashboards, analyzing clickstream data, and visualizing geospatial queries on maps.

CitusDB 4.0 makes notable usability and performance improvements, offers all new PostgreSQL 9.4 goodness, and comes with a shard rebalancer for dynamic scaling and fault tolerance.

Next three months: We're focusing on usability, usability, and usability. We made many improvements to CitusDB over the past two years, but we for example never got around to updating our documentation. We're now completely revamping our documentation to make these features accessible. We're also working on features that will make CitusDB easier to deploy and use.

Next year: SQL is huge. We already have good SQL coverage, and we're at a place where parallelizing new SQL features takes a few hundred lines of code. We will look to improve our SQL coverage.

Try CitusDB v4.0 for yourself by downloading it here!

pg_shard

pg_shard is an open source sharding extension for PostgreSQL. As a standalone extension, it addresses many NoSQL use-cases that involve short reads and writes. pg_shard 1.1 improves INSERT performance by up to 4x, provides more seamless integration with CitusDB, and adds shard repair functionality.

We find that many of our users want to use pg_shard and CitusDB together. pg_shard enables real-time data ingest at scale. CitusDB provides bulk loading and supports advanced analytic features such as distributed joins.

Next three months: We're making new improvements to lower the barrier to entry for PostgreSQL users who are looking to scale out. We also heard from users that they'd love to see query parallelization for simple analytical queries. We plan to have parallel aggregates and group bys in our next release.

Next year: pg_shard currently keeps its metadata in one machine. (We have enthusiastic users who manually replicate that metadata.) We intend to make the metadata fully distributed as well; that way, users will be able to write to any node.

We currently have two competing designs in review. One offers eventual consistency semantics similar to NoSQL databases, and takes shorter to implement. The other offers better distributed consistency and isolation guarantees, and provides behavior similar to SAP HANA. We will publish these two designs and ask for feedback on them on Hacker News.

Try pg_shard 1.1 for yourself by downloading it here!

cstore_fdw

cstore_fdw is an open source columnar storage engine for PostgreSQL, and as a consequence for CitusDB. One benefit of cstore is that it compresses the underlying data on average by 4x. Also, users who have database tables of 50+ columns see notable I/O benefits thanks to the columnar layout. These two benefits enable cstore_fdw 1.2 to improve query performance for data warehousing workloads.

Next three months: We will improve on cstore_fdw's statistics collection and cost estimates to provide consistent and fast query performance. We will also have Alter Table ... Add/Remove Column support in our next release. Finally, we're investigating the best way to provide back-ups for columnar tables.

Next year: This is an area where we're looking for input from cstore_fdw users. One big feature in consideration is a vectorized executor for cstore. We already have a prototype for this idea, and saw performance improvements of 3-4x for in-memory SELECT queries. Still, we know that writing a new PostgreSQL executor is a notable commitment, and we’re looking to hear more from our users here.

Try cstore_fdw 1.2 for yourself by downloading it here!

In summary, we’re focusing on usability, seamless integration, and full metadata distribution for the next year. We envision a world where you can run your local or distributed SQL just the same, and simply get all of scaling’s benefits in the latter case!

{loadnavigation_blog}

Webinar on Sharding and Scaling PostgreSQL on April 29th

I am hosting a webinar on April 29th at 10:00 am Pacific time to discuss the core principles of PostgreSQL database scaling. The webinar will enable attendees to avoid common mistakes that limit future system capabilities. You can register now by visiting the Citus Data webinars page.

During the webinar, I will discuss how to simplify PostgreSQL scaling by using pg_shard, an open-source extension for managing the placement, distribution, and replication of a sharded database. pg_shard is a standalone PostgreSQL extension developed by Citus Data which addresses many NoSQL use cases while also enabling targeted real-time analytical queries. For users who need complex full-cluster queries and JOINs, pg_shard provides an easy upgrade path to CitusDB. The pg_shard extension maintains full compatibility with the standard PostgreSQL tool set, types, indexes, and semi-structured data types.

By the end of the webinar, attendees will understand the core tenets of database scaling as well as which scaling problems are appropriate to address with sharding. In addition, I will cover the features and functionality of pg_shard by presenting use cases where it is being successfully used in production environments today.

I hope you will join me on April 29th. Visit the Citus Data webinars page now to register.

{loadnavigation_blog}

Page 1 of 16

About

CitusDB is a scalable analytics database that's built on top of PostgreSQL.

In this blog, we share our ideas and experiences on databases and distributed systems.

Connect

Download Software

View a Webinar