POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Craig Kerstiens
September 19, 2018
Citus extends Postgres to be a horizontally scalable database. By horizontally scalable, we mean the data is spread across multiple machines, and you're able to scale not only storage but also memory and compute—thus providing better performance. Without using something like Citus to transform PostgreSQL into a distributed database, sure you can add read replicas to scale, but you're still maintaining a single copy of your data. When you run into scaling issues with your Postgres database, adding a read replica and offloading some of your traffic to your read replica is a common bandaid to slow down the bleeding, but it is only a matter of time until even that doesn't work any further. Whereas with Citus, scaling out your database is as simple as dragging a slider and rebalancing your data.
But that leaves a question, are read-replicas still useful? Well, sure they are.
In Citus Cloud (our fully-managed database as a service), we have support for read replicas, in our case known as followers. Follower clusters leverage much of our same underlying disaster recovery infrastructure that forks leverage, but support a very different set of use cases.
Previously we talked about how forks can be used in Citus Cloud to get a production set of data over to staging that can help with testing migrations, rebalancing, or SQL query optimization. Forks are often used as a one-off for a short period of time. In contrast, followers are often long running Citus database clusters that can be a key mechanism to run your business, helping you get insights when you need them.
A follower cluster receives all updates from the primary Citus cluster in an asynchronous fashion. Often followers are only a few seconds (if any) behind your primary database cluster, though can lag at times by a few minutes. Followers have a full copy of your data, but reside on a separate cluster. This means you have a separate copy of the data with its own compute power, memory, and disk.
Fun fact about follower formations, you can create them cross-region as well. Want to have a full copy of your database in another region that you can fail over to in a disaster situation? Want to provide lower latency on particular reads for a different geography? A cross-region Citus follower can help.
So when are followers useful in Citus? Followers are most helpful for analytical workloads that may be longer running. Want to compute some complex report against your entire data set? Performing ETL from your primary database into some other data warehouse where data is cleansed and obfuscated before it goes in?
Each of these use cases may be important for your business, but long-running SQL queries compete for the same resources as your primary workload for the database. This competition for performance can result in performance issues at times for the primary workload in your database. By splitting this analytical workload out to operate on a Citus follower cluster, you can give your internal data analysts the access they need, without having to subject them to the same code review processes of production—all while keeping your production database safe.
While using read replicas as an attempt to scale your main application may introduce all sorts of unnecessary complexity and create more work than problems that are solved, read replicas do still have their place. When using the Citus extension to Postgres to solve the problem of scaling performance, you can create follower formations (aka read replicas) to provide access to your data internally without having to risk production availability. It’s a powerful combination.