How the Citus distributed query executor adapts to your Postgres workload

Written by Marco Slot
April 27, 2020

In one of our recent releases of the open source Citus extension, we overhauled the way Citus executes distributed SQL queries—with the net effect being some huge improvements in terms of performance, user experience, Postgres compatibility, and resource management. The Citus executor is now able to dynamically adapt to the type of distributed SQL query, ensuring fast response times both for quick index lookups and big analytical queries.

We call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.

Why we needed to improve upon the original Citus query executors

In a distributed database like Citus, there are two ways of handling a Postgres query. Either:

  • the query contains enough information to determine in which shard (i.e. in which Postgres table on the worker) the data is stored—and to route the query to the right node with minimal overhead, or
  • the query is parallelized across all shards to handle large data volumes

Citus supports both models for handling SQL queries in Postgres, which has allowed Citus to scale both multi-tenant SaaS applications as well as real-time analytics applications, including time series dashboards.

To handle the requirements of these different types of PostgreSQL workloads, Citus traditionally had multiple executor code paths. Previously:

  • the Citus router executor was optimized for routing queries directly to the right node, with minimal overhead, and
  • the Citus real-time executor was optimized for large parallel computations by opening a connection per shard, to query many shards using all available cores

Because the original Citus real-time executor was not designed with distributed transactions in mind, it did not have the ability to reuse connections for multiple shards on the same worker.

On the other hand, because the original Citus router executor was designed for minimal overhead, it explicitly reused connections for all shards.

Using different executors in the same transaction block used to complicate migrations from PostgreSQL to Citus. Moreover, there is a large class of queries that do span across shards, but do not benefit from multi-core parallelism yet still paid the overhead of establishing many connections per Citus worker node.

In one of our recent Citus open source releases, we delivered a single executor that can handle different workloads and solves all the shortcomings of the real-time and router executor in a single unified code path: The Citus adaptive executor.

Introducing the Citus Adaptive Executor

The Citus adaptive executor uses a dynamic pool of connections to each worker node to execute Postgres queries on the shards (SQL tasks).

What this means is: the Citus adaptive executor can execute multiple tasks over a single connection per Citus worker node, to minimize overhead or to parallelize queries across connections per worker node to use multiple cores—giving you the ability to use hundreds of cores and combine the memory of many servers.

The Citus executor parallelizes not only SELECT queries, but also DML (e.g. UPDATE), DDL (e.g. CREATE INDEX) and other utility statements (e.g. VACUUM) across multiple worker nodes and multiple cores. Moreover, these parallelized statements can be part of bigger transaction blocks or stored procedures that are executed as one distributed Postgres transaction.

To ensure queries can always see the result of preceding statements in a transaction block, the adaptive executor first checks whether SQL tasks need to be assigned to a particular connection. If there were preceding writes on the shard(s) that the task accesses, then the executor assigns the SQL task to the connection that did the write—and otherwise the executor assigns the task to the pool to be executed as soon as a connection becomes available.

By dynamically adjusting the pool size to the SQL query, the Citus adaptive executor can handle:

  • queries that are routed to a single worker node,
  • queries that are parallelized across all worker node over a single connection (=core) per node, as well as
  • queries that are parallelized across many cores per worker.

The adaptive executor also enables users to limit the number of connections per worker that a query can open, to handle distributed PostgreSQL queries with high concurrency.

The not-so-secret sauce behind the Adaptive Executor

Part of what makes the executor adaptive is a technique we call “slow start”, which is inspired by the famous TCP Slow Start algorithm that allows TCP to adapt to the available bandwidth and avoid congestion.

Every 10 milliseconds, the number of new connections that the adaptive executor can open to a worker node will increase by 1, if there are tasks remaining for that worker node. That way, if a distributed query does a quick ~1ms index lookup on each shard, the executor will never open more than a single connection per node. On the other hand, if we're dealing with an analytical query that takes several seconds, the executor will quickly parallelize it across all available cores by running tasks in parallel over multiple connections per node.

So the Citus adaptive executor does not need to guess how long a SQL query will take, but rather will adapt to each query’s runtime.

In the upcoming Citus 9.3 open source release, the adaptive executor will also factor in the total number of connections it makes to each worker node. When it approaches the connection limit on a worker (max_connections), the executor will avoid opening additional connections for parallelism, such that the total number of connections to each worker never exceeds the total number of connections that the client is making to the coordinator. That ensures you can run many parallel queries concurrently without any issues.

Finally, while this might not seem important to most users, as a developer I really like the Citus adaptive executor’s well-documented code and especially its elegant state machines. The extensive code comments and state machines help ensure we keep one of the core parts of Citus working smoothly. We were also able to eliminate many inefficiencies, so we’ve seen performance improvements across the board as well as better error messages.

The Citus adaptive executor gives you faster queries, higher concurrency, stronger SQL support, & more

All of us on our Citus open source engine team at Microsoft—spread between Amsterdam, Vancouver, and Istanbul—were excited to launch the Citus adaptive executor. Not just because this new feature represents exciting distributed systems work, but because of the benefits the adaptive executor brings to Citus open source users—as well as to Azure Database for PostgreSQL users who are using the Hyperscale (Citus) to scale out Postgres horizontally on Azure.

The list of Citus adaptive executor benefits includes:

  • Improved performance for distributed (multi-shard) SQL queries
  • Improved support for transactions in Postgres that mix single-shard and multi-shard queries (incl. SELECT, DML, DDL, COPY)
  • Improved error messages for multi-shard Postgres queries
  • Up to 20X faster response times for simple multi-shard queries (e.g. Postgres index lookups on non-distribution column)
  • Better resource management by configuring the maximum number of connections that Citus can make to a worker node, and the maximum number of connections Citus can keep across transactions
  • Better fault tolerance when running out of Postgres database connections on the worker node

One of the biggest benefits for Citus open source users is that the adaptive executor will simplify migrations from single-node PostgreSQL to a Citus distributed database cluster. That’s because almost all multi-statement transactions will now just work, and there is no risk of Citus operations getting a lot slower when queries do not have a distribution column filter.

Another benefit is that the Citus adaptive executor opens the door to enabling new use cases, for which the overhead of opening a database connection per shard would have been prohibitive before.

For instance, with the adaptive executor, search applications can now take advantage of parallelism when doing linear searching across all database records—while also being able to run fast index lookups with minimal overhead and high concurrency.

Our teammate begriffs has created some useful getting-started tutorials for Citus

If you want to learn more about how the Citus executors run queries, our Citus Docs has a good SQL reference section on query processing.

And if you want to roll up your sleeves and explore whether scaling out Postgres with Citus is a good fit for you and your application, the 2 best places to start are:

  • Download Citus packages and dive into one of the tutorials that our awesome documentation expert Joe Nelson aka begriffs created, such as the multi-tenant tutorial, the real-time analytics tutorial in our Citus open source documentation.

  • Try out Citus on Microsoft Azure, by following the quickstart docs to provision a Hyperscale (Citus) server group (often called a cluster) on Azure Database for PostgreSQL; there are also useful Postgres tutorials on docs.microsoft.com, including this one on designing a multi-tenant database and another tutorial on designing a real-time analytics dashboard with Hyperscale (Citus).

We can’t wait to see what you build with Citus

Once you get started, remember to join our Citus slack as there is lots of good community Q&A on the Citus Slack every single day. And just in case you don’t already know about it, a quick plug for the Citus newsletter: we curate our Citus technical newsletter each month with useful Postgres and Citus developer content, mostly links to blog posts or videos—with the goal of being useful, and never boring.

Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus