Citus Talk at CMU: Distributed PostgreSQL as an Extension

Written by Marco Slot
April 10, 2021

Last month we released Citus 10 and we've received an overwhelming amount of positive feedback on the new columnar compression and single node Citus features, as well as the news that we’ve open sourced the shard rebalancer.

The new and exciting Citus 10 features are bringing in lots of new users of Citus open source and the Citus database service on Azure. And many of you are asking:

“How exactly does Citus scale out Postgres?”

How does Citus work?

As it happens, I recently gave a talk as part of the Vaccination Database Tech Talks series at Carnegie Mellon University, organized by Andy Pavlo. If you missed my talk—titled “Citus: Distributed PostgreSQL as an Extension”—the good news is that the CMU team recorded it. The vaccination database tech talk on Citus is now available for you to watch online.

My goal in this talk was to explain how Citus turns PostgreSQL into a distributed database—without changing any Postgres code, by leveraging the Postgres extension APIs.

The talk may contain traces of C code because my primary audience was database students, but I used diagrams in my slides to keep the distributed database concepts accessible for everyone. My talk also covers the workloads and the broader context for which the Citus extension to Postgres was developed.

Key moments in my Vaccination Database Tech Talk at CMU

By watching this video about Citus and Postgres, you can learn about:

  • When to use Citus to distribute Postgres: Not every Postgres workload necessarily benefits from the ability to scale. And not every workload that needs to scale out has any business with Postgres. In this bit, I cover 4 of the common workloads where it makes sense to use Citus. [watch at 6:54]
  • Using the PostgreSQL extension APIs: What it means to be a PostgreSQL extension and how you can use the extension APIs to create brand new capabilities without changing any Postgres code? [watch at 12:24]
  • Transparent sharding: How Citus enables you to create distributed tables and use the hooks that PostgreSQL provides to implement transparent sharding. [watch at 17:13]
  • Distributed Query Engine: How Citus plans and executes distributed queries, and the layers in the planner for handling different classes of queries with minimal overhead. [watch at 22:24]
  • Distributed Transactions: How Citus performs distributed transactions, including two-phase commit recovery and distributed deadlock detection. [watch at 30:21]
  • Demo: Quick demo on my laptop with a 3-node Citus database cluster to see how to create and interact with distributed tables. [watch at 41:31]
  • Lessons learned: What our engineering team at Citus has learned over the years about building a distributed database system, helping customers to onboard, ORMs—and Postgres. [watch at 45:03]

Big thanks to Andy Pavlo for organizing this series of Vaccination Database Tech Talks and inviting me to talk about Citus!

Video of my talk on distributing PostgreSQL with Citus

I hope you enjoy this deep dive into how the Citus extension works its magic. If you have questions after watching, the Citus engineers and I are always hanging out on the Citus Slack. Or, if you’re ready to start playing with Citus, check out the Citus repo on GitHub or take advantage of these tools for getting started with Citus.

YouTube video still: Citus 10 Brings Columnar to Postgres
Video: CMU Database Group video recording of my talk about distributing PostgreSQL with the Citus extension

And there's more

If databases are your thing, there are many good tech talks in the Vaccination Database Talk series as well as the Quarantine Database Tech Talks that Andy Pavlo organized at CMU. Shout out to Nico Bruno and Cesar Galindo-Legaria for their talk on the Cascades Framework for query optimization. As a database developer, query optimization is something we all care about and the SQL Server one is certainly advanced.

Also I recommend Manuel Rigger’s SQLancer talk on Finding Logic Bugs in Database Management Systems. Manuel’s talk inspired the summer intern project that Nazli Ugur Koyluoglu did for Citus last year, mining for logic bugs in the Citus extension. And we continue to use SQLancer to discover SQL planner bugs we would not have been able to find ourselves.

Later this semester, on Jun 07, Robert Haas will be giving a talk on PostgreSQL! I’m not quite sure which aspect of Postgres Robert will cover, but it’s bound to be interesting because, well, it’s Postgres—plus, Robert is one of the world’s leading experts on Postgres.

Slides for this talk on Speakerdeck

Slides from my CMU database talk on Citus: Distributed PostgreSQL as an Extension.
Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus