Announcing Citus MX: Scaling Postgres to over 500k writes per second

Update in July 2022: As of the Citus 11 open source release, a newer and much-improved version of the MX functionality is now available as part of the Citus "query from any node" capabilities. Hence, rather than reading this MX post, we recommend you read this section about querying distributed Postgres tables from any node in the newer Citus 11 blog post. If you have questions about querying from any node (or about other aspects of Citus), please join the Citus public slack to participate in community Q&A. And if you want to roll up your sleeves to give Citus a try, our getting started resources should be useful—or you can jump straight to downloading the Citus packages.

Today we’re excited to announce the private beta of Citus MX. Citus MX builds on the Citus extension for PostgreSQL, which allows you to scale out PostgreSQL tables across many servers. Citus MX gives you the ability to write to or query distributed tables from any node, which allows you to horizontally scale out your write-throughput using PostgreSQL. It also removes the need to interact with a primary node in a Citus cluster.

We’ve performed over 500k durable writes per second (using YCSB) on a 32 node Citus Cloud cluster with our regular PostgreSQL settings. We've also exceeded ingest rates of 7 million records per second using batch COPY. Watch the video to see it in action. If you're curious to learn more, read on or to get access, sign up below.

Scaling your writes

With current editions of Citus it's already simple to scale out PostgreSQL tables. Once your data is distributed across many nodes, you have more memory and parallel processing power available to help scale your read workloads. As you query through the leader, Citus routes your query to the appropriate node, pushing down the heavy lifting of lookups and aggregations to data nodes. Whether you’re performing real-time analytics across terabytes of data or looking to scale out your transactions this is straightforward today.

Now those that need even higher write volumes, can achieve it through MX. Under the covers Citus MX keeps all the metadata about which data lives where in sync across the cluster. We expose to you a single URL which pools all of your nodes together so each connection goes to a random node. This means that you can have a higher number of connections and each connection you make is helping you scale your writes.

Improved availability

One of the favorite features of Citus according to our customers is improving uptime through redundancy. To date though you still had to manage a failover setup for your leader node, and in the event that it failed, all writes and reads would fail until it was brought back online. While high availability reduces the length of downtime, should a node fail your entire cluster would still be unavailable for this minute while the failover was happening. With Citus MX as long as a node is available, the data on it can be read or written even if other nodes are down.

How it works

In Citus MX you can access your database in one of two ways: Either through the coordinator which allows you to create or make changes to distributed tables, or the data URL which routes you to one of the data nodes on which you can perform regular queries on the distributed tables. These are also the nodes that hold the shards, the regular PostgreSQL tables in which the actual data is stored. When you perform a query on a distributed table, the command is either forwarded to the right shard based on filter conditions, or parallelised across all the shards.

Streaming replication

One key part that enables Citus MX is our usage of Postgres streaming replication. Citus has a built-in replication mechanism in which the leader node sends writes to all replicas. However, this requires that the leader is involved in every write to ensure linearizability. Fortunately, streaming replication provides us with an alternative that we use in Citus MX: Let PostgreSQL handle the replication, removing the need for a single leader. We also make sure that MX automatically sets up streaming replicas and handles node fail-overs for you.

By having hot standbys for all nodes and putting the distributed tables on each of them, the impact of a single node failure is small and can be quickly recovered. Moreover, streaming replication is more efficient at handling certain types of writes like updates. Last, your application talks to a single endpoint and doesn’t require logic to handle load balancing and node fail-over.

Metadata makes it work

The major change we made in Citus MX is that the distributed tables and the metadata about where the data is located are propagated to all nodes. Since all the nodes have the Citus extension, they were already able to run distributed queries, but did not have the necessary metadata.

We’ve taken special care to ensure that metadata-changing commands like ALTER TABLE and moving shards to new nodes can be safely executed at any time. By using PostgreSQL’s built-in two-phase commit (2PC) mechanism for metadata changes, we avoid the risk of nodes ever having outdated metadata since it guarantees that all nodes apply the same changes. We’ve even covered very exceptional failure scenarios by adding a recovery mechanism for 2PCs that fail to complete at the last moment. These techniques ensure that the database remains reliable and consistent even when making changes to the cluster or when experiencing failures.

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

Scaling your writes

Improved availability

How it works

Streaming replication

Metadata makes it work

Enjoy what you’re reading?

SHARE THIS POST

Related Content