If you want to learn more about Citus on Microsoft Azure, read this post about Hyperscale (Citus) on Azure Database for PostgreSQL.
There are a number of applications out there that have a high number of connections to Postgres. What’s high? That all depends on your application, but generally when you get to the few hundred connection area in Postgres you’re in the higher end. Anything in the thousands is definitely in the high territory, and even several hundred can put strain on your application. Generally a safe level for connections should be somewhere around 300-500 connections. This may seem low if you’re already running with thousands of connections, but it’s likely perfectly fine with pgBouncer taking care of the heavy lifting for you. Let’s drill into why a bit further.
Connection basics in Postgres
Each new connection to Postgres is a forked process. This process comes with its own memory allocation of roughly 10 MB with at least some load for the query. In this sense every new connection is already consuming some of those precious resources you want processing queries. For 300 database connections this is 3 GB of memory going just to managing those connections—memory which could be better used for caching your data.
When a connection isn’t just a connection
Many application frameworks such as Rails like to grab a pool of connections when they start up. This reduces the time needed to get a connection when they run a query, instead they pull one from the pool. This is good for your app’s performance as the time to get a connection isn’t always trivial. However, this does result in a bunch of connections to your database sitting idle. Recently I saw an application with 300 open connections, but only 17 active queries. How do you check active queries? You can run this to give you some insight to currently active queries:
SELECT COUNT(*) FROM pg_stat_activity WHERE state <> 'idle'
The issue here is that by default every new request is going to grab a new connection, so if you have 300 requests happening at a time you have 300 connections to your database. For something like sidekiq, which processes background jobs for Ruby, you can easily use as many as 25 connections per process–in reality most of the time the job is just waiting on some other process like an upload or sending an email. Sure you can manually release the connection when you’re not using it, but that can become quite error prone, and further why do that work when you don’t have to.
The short of it is, even if you see “300” open connections to your database, you may not actively have 300 concurrent transactions going on.
Enter connection pooling for Postgres
One could easily blame Rails or Django, but in reality transactional connection pooling is a lot of extra logic to build into a framework. Instead, a simpler option exists in the form of Postgres-specific connection pooling. A connection pooler will do the hard work of maintaining a pool of connections and then give them out as your application needs them, which is when a transaction or query happens. A connection pooler can have a number of settings two very key ones are:
- A max amount of active connections
- A max on idle connections
For Citus Cloud the limit is 300 active connections and 2000 idle connections.
There are a few options when it comes to your connection pooler, including PgBouncer and PgPool. At Citus we leverage PgBouncer.
Setting up PgBouncer (our preferred connection pooler)
If you’re running Postgres and haven’t setup your connection pooler yet, it’s worth giving a watch to some of the recent talks at PGConf SV on PgBouncer here and here. For PgBouncer there are a few modes it can be run in: session pooling and transaction pooling. What you want to more efficiently manage your connections is transaction pooling. Transaction pooling will grant a connection when you run
BEGIN; and return the transaction when you
COMMIT;. Note: with transaction pooling you do need to make sure you turn off prepared statements which Rails turns on by default.
If you’re running on something like Heroku they have a custom buildpack for you to setup PgBouncer in front of your app. If you’re running on Amazon RDS and looking to setup PgBouncer this guide from @andrewkane gives you the steps.
PgBouncer connection pooling and Citus Cloud
On Citus Cloud, PgBouncer is already running and configured for you. To connect to PgBouncer, you can simply swap your port from
6432 and you’ll be connected to PgBouncer. This should make it easier for you when managing connections on Citus to get back to building your app and give you one less thing about your database to have to think or worry about.