Talking about Citus & Postgres at any scale

Written by Marco Slot
September 17, 2020

This post by the technical lead of our Citus open source team was originally published on the Azure Database for PostgreSQL Blog.

Update in October 2022: Citus has a new home on Azure! The Citus database is now available as a managed service in the cloud as Azure Cosmos DB for PostgreSQL. Azure documentation links have been updated throughout the post, to point to the new Azure docs.

I recently gave a talk about the Citus extension to Postgres at the Warsaw PostgreSQL Users Group. Unfortunately, I did not get to go in person to beautiful Warsaw, but it was still a nice way to interact with the global Postgres community and talk about what Citus is, how it works, and what it can do for you.

If you are already familiar with Postgres then this talk should be a good introduction to all the powerful capabilities that Citus gives you. The tl;dr is this: Citus is an open source extension to Postgres that transforms Postgres into a distributed database. Citus uses sharding and replication to distribute your data and your Postgres queries across a distributed database cluster.

Shining a light on the performance speedups of Citus (via demo)

Every so often, I try to rethink how I talk about Citus, especially as Postgres evolves and the needs of applications change, too. One thing we have not done much is talk directly about the performance improvements in Citus. Sometimes it's actually slower, but at scale Citus can be *a lot* faster. Therefore, I introduced every Citus feature with some benchmarks that show the performance compared to a (large) Postgres server.

The talk is also worth watching for the demo (the demo starts at 46:52) where I compare the performance of Hyperscale (Citus) on Azure Database for PostgreSQL against a single Postgres server. For the demo, I use GitHub archive data in an analytics use case, and the demo shows >250x speedups for analytical queries with Citus!

YouTube video still: Citus PostgreSQL at any Scale
Video of my talk at Warsaw PostgreSQL Users Group, on Citus: PostgreSQL at any Scale. Demo starts at 46:52, but the introductory discussion should be useful, too.

Props to the organizers of the Warsaw PostgreSQL Users Group—especially Alicja Kucharczyk—for the time they spend organizing Postgres talks for their community. And for inviting me to give a talk to their Postgres users group. I really appreciated all the good questions, too.

If this demo is your first intro to Citus & you want to learn more

Here are a few of the getting-started next steps I usually recommend to developers:

  • Download Citus packages locally: Citus is open source, so it's easy to download and try out.
  • Try Citus on Azure: In the months since Microsoft acquired Citus Data last year, we have also integrated Citus into our managed Postgres service on Azure: Citus is now available as Hyperscale (Citus), a built-in deployment option in Azure Database for Postgres. So you can also try out Citus on Azure.
  • Read the Citus open source docs: docs.citusdata.com has tutorials for multi-tenant SaaS applications and real-time analytics dashboards, a use case guide for time series data, details on pretty much every Citus feature, installation instructions for how to set Citus up locally on a single server as well as installing on multiple servers… the Citus docs are quite useful.
Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus