pg_paxos, High Availability Data Replication for PostgreSQL

Marco Slot Nov 19, 2015

About Citus

Citus is a distributed database that scales out PostgreSQL. Citus scales your multi-tenant database to 100K+ tenants or enable real-time analytics on large volumes of data.

Stay subscribed

Enjoy what you're reading? Sign-up to our newsletter to stay informed:

Other Recent Posts

Analyzing PostgreSQL Email Archives with PostgreSQL Dynamically resizing your Postgres cluster with Citus' shard rebalancer Distributed count(distinct) with HyperLogLog on Postgres More Articles

Like our blog, or have a question about Citus? Join us on Slack for a chat :)

The Paxos algorithm is a powerful building block for building highly available distributed systems. Paxos can be seen as a function paxos(k,v) that returns the same value on all servers in a group for a certain key (k), and the value is one of the inputs (v). Paxos is most commonly used to implement log replication through a technique called Multi-Paxos. In Multi-Paxos, nodes call paxos using the indexes in a log as keys and state changes (e.g. ‘set primary to node2’) as values. Using this technique, data can be kept consistent and available across multiple servers, even if some of them fail.

We recently started exploring Paxos as a technique for doing automated fail-over of PostgreSQL servers and wrote a simple implementation of Paxos and Multi-Paxos including basic support for membership changes in about 1000 lines of PL/pgSQL. We found PL/pgSQL to be an excellent tool for implementing Paxos, because the necessary transactional semantics are a natural part of the language. Other consensus algorithms would have been harder to implement in this way, because they rely on timers and other background tasks. 

The pg_paxos extension demonstrates how the Paxos functions can be used for fault-tolerant, consistent table replication in PostgreSQL. While a replicated table has high write and read latencies due to the network round-trips involved, it can be very useful for applications such as automated fail-over. We plan to add optimisations such as the ones applied in Google Megastore to reduce the overhead and develop the extension further to be able to replace components like etcd or Zookeeper.

← Next article Previous article →