Announcing Citus 6.2: a multi-tenant database for a civilized age

Written by Claire Giordano
June 7, 2017

Star Wars Episode IV Lightsaber image
Image courtesy of Lucasfilm and StarWars.com. © 1977 Twentieth Century Fox.

"Your father's lightsaber. This is the weapon of a Jedi Knight. Not as clumsy or random as a blaster. An elegant weapon, for a more civilized age."

—Obi-Wan Kenobi, Star Wars Episode IV: A New Hope

Announcing the release of Citus 6.2

Today I’m happy to announce that we’ve rolled out a new version of our database, Citus 6.2. Because as most of you know, good software never stops evolving. Nor should it. If you want the scoop on the new capabilities in Citus 6.2, just scroll ahead. But before diving in, I need to explain the lightsaber pic. Why? Because usually a picture speaks a thousand words, but sometimes it needs an annotation. :-)

When my colleagues first started on their journey to build Citus, they had a vision of combining the best aspects of relational databases with the elastic scale of NoSQL—to give developers a database that delivers SQL capabilities, at scale.

But vision alone does not make a successful company. The Citus co-founders needed a mix of key ingredients: the right team, good timing, good execution, a willingness to experiment and learn, plus (of course) a good idea.

When George Lucas describes his days before the first Star Wars film, he said he was “searching for just the right ingredients, characters and storyline.” In Lucas’s search for the right mix, he too had to iterate: he wrote four different screenplays before landing on the final version of the original film!

Because our CTO is such a big fan of Star Wars, Ozgun sometimes talks about his vision for Citus in the language of the Jedi: Ozgun has said his aim for Citus was “to create a database as elegant and as powerful as a lightsaber.” Now, I’m more of a Stranger Things fan myself (after all, mornings are for coffee and contemplation) but I get Ozgun’s desire to create a database that gives you the benefits of SQL—at scale.

What is Citus?

Citus is an open source database (built on Postgres ofc) that gives you powerful scale-out capabilities while preserving the richness of complex SQL transactions. The Citus team has delivered on Ozgun’s vision: Citus now gives you the benefits of Postgres plus the scale-out properties of NoSQL—without having to re-architect your application.

Citus is well-suited for B2B developers who are building multi-tenant apps as well as those building customer-facing analytics. And because Citus is an extension to Postgres (and not a fork), you benefit from all the continuing innovation in the Postgres community, including some of the latest features like JSONB.

Steve O’Grady of RedMonk is fond of saying that convenience will trump just about everything, and we agree. So we give you choice and flexibility: our Citus database is available as an open source download, on-prem software, and as a fully-managed database as a service. You pick.

In the year since our team open sourced Citus and launched our database as a service, we've been laser-focused on improving the experience for our primary B2B use cases: multi-tenant databases and real-time analytics.

Highlights of New Citus 6.2 Features

Expanding SQL coverage: Complex queries for multi-tenant databases & real-time analytics

In Citus 6.2, we expanded our SQL coverage for complex analytical queries to benefit both our multi-tenant B2B customers and our real-time analytics customers. What sets these complex analytical queries apart from others is their need to support interactive responses.

For example, real-time analytics customers need support for interactive queries such as funnels, segmentation, and creating user cohorts.

With Citus 6.2, and the new support for these interactive, analytical queries, Citus fully parallelizes even more of your SQL queries. In particular, when you model your data with table co-location and you write subqueries which join on that distribution key, you get full SQL coverage.

Reduce query planning time by over 10x

When you send a query to Citus, we used to rely on PostgreSQL’s partition pruning logic to determine the right shards and then we would only send queries to the relevant shards. This was ok for most of our customers, but it was problematic for our customers with unusually high numbers of shards, i.e. with hundreds of thousands of shards. In 6.2, we created a new capability in Citus to significantly improve partition pruning performance and reduce query planning times by more than 10x.

SERIAL on distribution column: we now use serial types exactly as in Postgres

Many Rails and Django apps use PostgreSQL’s serial type (an auto-incrementing integer.) With 6.2, you can use serial types exactly as you would in Postgres, and Citus will take care of the distribution logic for your application.

When you shard a multi-tenant app, part of the distribution key is often a serial type. When inserting a new record, the next value in the sequence needs to be generated so it can be properly hashed before being inserted. Previously you had to manually call nextval to get the next value, but now Citus automatically understands what you're trying to do and lets your serial data type work as if you were on a single-node Postgres database.

Schema changes on distributed tables: Less work to integrate your app

With Citus 6.2, you have even more support for the things you expect to do natively. One example: concurrent index creation now allows you to create indexes for your database at the same time you are taking reads & writes. Hence you reap the performance benefits of new indexes, without any corresponding downtime.

And with 6.2, we natively support the following commands on distributed tables, making integration with your app even simpler. We now propagate modifications of columns, constraints, and indexes from the coordinator node through the whole Citus cluster. You don't have to run the statements on all the nodes but rather you can let Citus update a distributed table’s schema in a transactional manner.

  • ALTER TABLE .. RENAME COLUMN
  • DROP TABLE within a transaction block
  • CREATE INDEX CONCURRENTLY to create indexes without blocking writes
  • ALTER TABLE .. ENABLE/DISABLE TRIGGER

Improved stats for performance of queries with pg_stat_statements

The pg_stat_statements feature is key tool in Postgres that gives you visibility into your database and enables you to optimize the performance of your queries. In the Citus 6.2 release, we’ve made it work out of the box with Citus. With pg_stat_statements, you can now see the total time a query has occupied against your system, the average time it takes to run, and the query itself. In combination with distributed EXPLAIN, pg_stat_statements is even more powerful, too.

Logging of cross-shard queries

Tip: you should always avoid querying multiple nodes when a single node will do.

In the past, some of our customers would accidentally trigger a query across multiple nodes, what we call a “cross-shard query.” If you’re deliberately running cross-shard queries (as you might do for cross-tenant analytics) then obviously it’s a fine thing to do. But for small, frequent lookups, the latency of accidentally running unnecessary cross-shard queries can add up.

So, new to Citus 6.2 is the ability to log cross-shard queries by setting citus.multi_task_query_log_level. This is super useful to detect queries that are not filtered by distribution keys, particularly when investigating changes in performance and also when migrating your app to Citus.

More improvements to Citus docs—plus new tutorials

Whether you’re using our open source download, or the enterprise software version of Citus, or our database as a service—docs matter.

So we’re constantly taking steps to make the Citus docs better. We updated the Citus 6.2 documentation to reflect new features, and updated our FAQ as well.

Leading up to Citus 6.2, we also rolled out two new Citus tutorials: The tutorial for multi-tenant applications builds a database for an ad-analytics app which you can use to view, change, analyze, and manage ads & campaigns. And the tutorial for real-time analytics inserts large volumes of events data and enables analytical queries with sub-second latencies.

Full list of Citus 6.2 changes online on github

This post covers the highlights of Citus 6.2. You can find the complete list of Citus 6.2 changes on github.

Citus 6.2 gives multi-tenant apps the power of SQL, at scale. Elegant, powerful, civilized.

Today, if you have a B2B application that’s growing, and you’re using a relational database to capture data from multiple tenants, you have a few different options to manage growth once you start hitting the ceiling on single-node Postgres.

One option is to migrate to a NoSQL database: doing this would require you to denormalize your data, move the denormalized data to NoSQL, rewrite your entire app to talk to this new NoSQL data model, and potentially give up some of the data guarantees you had before. This application migration is time consuming, and a bit painful.

A simpler approach is to adopt Citus to scale out your multi-tenant database. With Citus, you can scale to 100K+ tenants while keeping powerful SQL features such as transactions, joins, and foreign key constraints. And the real time-savings: with Citus, you don’t need to re-architect your SaaS application to scale-out, rather, all you need to do is distribute your tables on the tenant_id / customer_id dimension.

If you choose Citus for your multi-tenant app, you’ll get the best aspects of relational databases with the ability to scale out. And not just any relational database, but Postgres. In the words of Obi-Wan, you could say that with Citus, you’ll have “an elegant database for a more civilized age.” :)

Claire Giordano

Written by Claire Giordano

Head of open source community efforts for Postgres at Microsoft. Alum of Citus Data, Amazon, Sun Microsystems, and Brown University CS. Conference speaker at PGConfEU, FOSDEM, PGConf NYC, Nordic PGDay, pgDay Paris, PGDay Chicago, Swiss PGDay, & Citus Con. Talk Selection Team chair for POSETTE: An Event for Postgres. Loves sailing in Greece.

@clairegiordano @clairegiordano@hachyderm.io clairegiordano