Announcing Citus MX: Scaling Postgres to over 500k writes per second

Written by Marco Slot
September 22, 2016

Update in 2020: Citus MX is not currently available for production use for new Citus Enterprise customers, nor for Hyperscale (Citus) users on the managed Azure Database for PostgreSQL service. For questions about MX, please reach out to our Citus open source leadership team via the Contact Us page; you can also join the Citus public slack to participate in community Q&A; and if you want to roll up your sleeves and give Citus a try, it’s easy to download Citus open source packages.

Today we’re excited to announce the private beta of Citus MX. Citus MX builds on the Citus extension for PostgreSQL, which allows you to scale out PostgreSQL tables across many servers. Citus MX gives you the ability to write to or query distributed tables from any node, which allows you to horizontally scale out your write-throughput using PostgreSQL. It also removes the need to interact with a primary node in a Citus cluster.

We’ve performed over 500k durable writes per second (using YCSB) on a 32 node Citus Cloud cluster with our regular PostgreSQL settings. We’ve also exceeded ingest rates of 7 million records per second using batch COPY. Watch the video to see it in action. If you’re curious to learn more, read on or to get access, sign up below.

YouTube video still: Citus MX

Scaling your writes

With current editions of Citus it’s already simple to scale out PostgreSQL tables. Once your data is distributed across many nodes, you have more memory and parallel processing power available to help scale your read workloads. As you query through the leader, Citus routes your query to the appropriate node, pushing down the heavy lifting of lookups and aggregations to data nodes. Whether you’re performing real-time analytics across terabytes of data or looking to scale out your transactions this is straight-forward today.

Now those that need even higher write volumes, can achieve it through MX. Under the covers Citus MX keeps all the metadata about which data lives where in sync across the cluster. We expose to you a single URL which pools all of your nodes together so each connection goes to a random node. This means that you can have a higher number of connections and each connection you make is helping you scale your writes.

Improved availability

One of the favorite features of Citus according to our customers is improving uptime through redundancy. To date though you still had to manage a failover setup for your leader node, and in the event that it failed, all writes and reads would fail until it was brought back online. While high availability reduces the length of downtime, should a node fail your entire cluster would still be unavailable for this minute while the failover was happening. With Citus MX as long as a node is available, the data on it can be read or written even if other nodes are down.

How it works

In Citus MX you can access your database in one of two ways: Either through the coordinator which allows you to create or make changes to distributed tables, or the data URL which routes you to one of the data nodes on which you can perform regular queries on the distributed tables. These are also the nodes that hold the shards, the regular PostgreSQL tables in which the actual data is stored. When you perform a query on a distributed table, the command is either forwarded to the right shard based on filter conditions, or parallelised across all the shards.

Streaming replication

One key part that enables Citus MX is our usage of Postgres streaming replication. Citus has a built-in replication mechanism in which the leader node sends writes to all replicas. However, this requires that the leader is involved in every write to ensure linearizability. Fortunately, streaming replication provides us with an alternative that we use in Citus MX: Let PostgreSQL handle the replication, removing the need for a single leader. We also make sure that MX automatically sets up streaming replicas and handles node fail-overs for you.

By having hot standbys for all nodes and putting the distributed tables on each of them, the impact of a single node failure is small and can be quickly recovered. Moreover, streaming replication is more efficient at handling certain types of writes like updates. Last, your application talks to a single endpoint and doesn’t require logic to handle load balancing and node fail-over.

Metadata makes it work

The major change we made in Citus MX is that the distributed tables and the metadata about where the data is located are propagated to all nodes. Since all the nodes have the Citus extension, they were already able to run distributed queries, but did not have the necessary metadata.

We’ve taken special care to ensure that metadata-changing commands like ALTER TABLE and moving shards to new nodes can be safely executed at any time. By using PostgreSQL’s built-in two-phase commit (2PC) mechanism for metadata changes, we avoid the risk of nodes ever having outdated metadata since it guarantees that all nodes apply the same changes. We’ve even covered very exceptional failure scenarios by adding a recovery mechanism for 2PCs that fail to complete at the last moment. These techniques ensure that the database remains reliable and consistent even when making changes to the cluster or when experiencing failures.

Marco Slot

Written by Marco Slot

Lead engineer on the Citus engine team at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus