Citus Blog

Articles tagged: replication

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

Keep reading
Marco Slot

Test drive the Citus 11.0 beta for Postgres

Written byBy Marco Slot | March 26, 2022Mar 26, 2022

Today we released Citus 11.0 beta, which is our first ever beta release of the Citus open source extension to Postgres. The reason we are releasing a beta version of 11.0 is that we are introducing a few fundamentally new capabilities, and we would like to get feedback from those of you who use Citus before we release Citus 11.0 to the world.

The biggest change in Citus 11.0 beta is that the schema and Citus metadata are now automatically synchronized throughout the database cluster. That means you can always query distributed tables from any node in a Citus cluster!

The easiest way to use Citus is to connect to the coordinator node and use it for both schema changes and distributed queries, but for very demanding applications, you now have the option to load balance distributed queries across the worker nodes in (parts of) your application by using a different connection string and factoring a few limitations.

Keep reading
David Rowley

Speeding up recovery & VACUUM in Postgres 14

Written byBy David Rowley | March 25, 2021Mar 25, 2021

One of the performance projects I’ve focused on in PostgreSQL 14 is speeding up PostgreSQL recovery and vacuum. In the PostgreSQL team at Microsoft, I spend most of my time working with other members of the community on the PostgreSQL open source project. And in Postgres 14 (due to release in Q3 of 2021), I committed a change to optimize the compactify_tuples function, to reduce CPU utilization in the PostgreSQL recovery process. This performance optimization in PostgreSQL 14 made our crash recovery test case about 2.4x faster.

The compactify_tuples function is used internally in PostgreSQL:

  • when PostgreSQL starts up after a non-clean shutdown—called crash recovery
  • by the recovery process that is used by physical standby servers to replay changes (as described in the write-ahead log) as they arrive from the primary server
  • by VACUUM

So the good news is that the improvements to compactify_tuples will: improve crash recovery performance; reduce the load on the standby server, allowing it to replay the write-ahead log from the primary server more quickly; and improve VACUUM performance.

Keep reading

As part of the Citus team (Citus scales out Postgres horizontally, but that’s not all we work on), I've been working on pg_auto_failover for quite some time now and I'm excited that we have now introduced pg_auto_failover as Open Source, to give you automated failover and high availability!

When designing pg_auto_failover, our goal was this: to provide an easy to set up Business Continuity solution for Postgres that implements fault tolerance of any one node in the system. The documentation chapter about the pg_auto_failover architecture includes the following:

It is important to understand that pg_auto_failover is optimized for Business Continuity. In the event of losing a single node, then pg_auto_failover is capable of continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL Synchronous Replication.

Introduction to pg_auto_failover

The pg_auto_failover solution for Postgres is meant to provide an easy to setup and reliable automated failover solution. This solution includes software driven decision making for when to implement failover in production.

Keep reading
Ozgun Erdogan

Three Approaches to PostgreSQL Replication and Backup

Written byBy Ozgun Erdogan | February 21, 2018Feb 21, 2018

The Citus distributed database scales out PostgreSQL through sharding, replication, and query parallelization. For replication, our database as a service (by default) leverages the streaming replication logic built into Postgres.

When we talk to Citus users, we often hear questions about setting up Postgres high availability (HA) clusters and managing backups. How do you handle replication and machine failures? What challenges do you run into when setting up Postgres HA?

The PostgreSQL database follows a straightforward replication model. In this model, all writes go to a primary node. The primary node then locally applies those changes and propagates them to secondary nodes.

In the context of Postgres, the built-in replication (known as "streaming replication") comes with several challenges:

  • Postgres replication doesn’t come with built-in monitoring and failover. When the primary node fails, you need to promote a secondary to be the new primary. This promotion needs to happen in a way where clients write to only one primary node, and they don’t observe data inconsistencies.
  • Many Postgres clients (written in different programming languages) talk to a single endpoint. When the primary node fails, these clients will keep retrying the same IP or DNS name. This makes failover visible to the application.
  • Postgres replicates its entire state. When you need to construct a new secondary node, the secondary needs to replay the entire history of state change from the primary node. This process is resource intensive—and makes it expensive to kill nodes in the head and bring up new ones.

The first two challenges are well understood. Since the last challenge isn’t as widely recognized, we’ll examine it in this blog post.

Keep reading

In both Citus Cloud 2 and in the enterprise edition of Citus 7.1 there was a pretty big update to one of our flagship features—the shard rebalancer. No, I’m not talking about our shard rebalancer visualization that reminds me of the Windows '95 disk defrag. (Side-node: At one point I tried to persuade my engineering team to play tetris music in the background while the shard rebalancer UI in Citus Cloud was running. The good news for all of you is that I was overwhelmingly veto'ed by my team. Whew.) The interesting new capability in the Citus database is the online nature of our shard rebalancer.

Keep reading
Marco Slot

Podyn: DynamoDB to PostgreSQL replication and migration tool

Written byBy Marco Slot | September 22, 2017Sep 22, 2017

Wouldn’t it be great if you could run SQL queries on your data in DynamoDB? While this isn’t possible directly, there is an alternative: With Podyn, you can automatically replicate the schema, data, and changes in your DynamoDB tables to Postgres. Once your data is flowing into Postgres, you can start using a wide array of features including views, indexes, rollup tables, and advanced SQL queries. And if you chose Dynamo because you needed scale beyond what you thought a single Postgres instance could provide, you can replicate it into Citus.

Keep reading

AWS is the leader when it comes to the cloud, and for good reason. AWS is well ahead in the quality and breadth of services they offer.

However, when a service is running at the scale of AWS, it is natural to expect some failures to occur. According to AWS EBS availability is designed for 99.999%.

The annual failure rate (AFR) is 0.1% - 0.2%, where failure means a complete or partial failure. For example, if you had 1,000 EBS discs, you should expect 1 or 2 to have a failure per year. In our experience, partial failure is significantly more common than a complete loss. Even so, a partial loss can take a lot of time to resolve and can still be debilitating to a business.

Over the years, there have been some AWS failures that made news headlines due to havoc caused for both companies and their users. These incidents put a spotlight on AWS’ imperfections.

Keep reading
Ozgun Erdogan

Citus' Replication Model: Today and Tomorrow

Written byBy Ozgun Erdogan | December 15, 2016Dec 15, 2016

Citus is a distributed database that extends (not forks) PostgreSQL. Citus does this by transparently sharding database tables across the cluster and replicating those shards.

After open sourcing Citus, one question that we frequently heard from users related to how Citus replicated data and automated node failovers. In this blog post, we intend to cover the two replication models available in Citus: statement-based and streaming replication. We also plan to describe how these models evolved over time for different use cases.

Keep reading

Page 1 of 1