When Postgres blocks: 7 tips for dealing with locks

Last week I wrote about locking behaviour in Postgres, which commands block each other, and how you can diagnose blocked commands. Of course, after the diagnosis you may also want a cure. With Postgres it is possible to shoot yourself in the foot, but Postgres also offers you a way to stay on target. These are some of the important do’s and don’ts that we’ve seen as helpful when working with users to migrate from their single node Postgres database to Citus or when building new real-time analytics apps on Citus.

1: Never add a column with a default value

Update in Feb 2022: As of PostgreSQL 11, adding a column with a default no longer rewrites the table. The rest of the tips in this blog post are still valid!

A golden rule of PostgreSQL is: When you add a column to a table in production, never specify a default.

Adding a column takes a very aggressive lock on the table, which blocks read and write. If you add a column with a default, PostgreSQL will rewrite the whole table to fill in the default for every row, which can take hours on large tables. In the meantime, all queries will block, so your database will be unavailable.

Don’t do this:

-- reads and writes block until it is fully rewritten (hours?)
ALTER TABLE items ADD COLUMN last_update timestamptz DEFAULT now();

Do this instead:

-- select, update, insert, and delete block until the catalog is update (milliseconds)
ALTER TABLE items ADD COLUMN last_update timestamptz;
-- select and insert go through, some updates and deletes block while the table is rewritten
UPDATE items SET last_update = now();

Or better yet, avoid blocking updates and delete for a long time by updating in small batches, e.g.:

do {
  numRowsUpdated = executeUpdate(
    "UPDATE items SET last_update = ? " +
    "WHERE ctid IN (SELECT ctid FROM items WHERE last_update IS NULL LIMIT 5000)",
    now);
} while (numRowsUpdate > 0);

This way, you can add and populate a new column with minimal interruption to your users.

2: Beware of lock queues, use lock timeouts

Every lock in PostgreSQL has a queue. If a transaction B tries to acquire a lock that is already held by transaction A with a conflicting lock level, then transaction B will wait in the lock queue. Now something interesting happens: if another transaction C comes in, then it will not only have to check for conflict with A, but also with transaction B, and any other transaction in the lock queue.

This means that even if your DDL command can run very quickly, it might be in a queue for a long time waiting for queries to finish, and queries that start after it will be blocked behind it.

When you can have long-running SELECT queries on a table, don’t do this:

ALTER TABLE items ADD COLUMN last_update timestamptz;

Instead, do this:

SET lock_timeout TO '2s'
ALTER TABLE items ADD COLUMN last_update timestamptz;

By setting lock_timeout, the DDL command will fail if it ends up waiting for a lock, and thus blocking queries for more than 2 seconds. The downside is that your ALTER TABLE might not succeed, but you can try again later. You may want to query pg_stat_activity to see if there are long-running queries before starting the DDL command.

3: Create indexes CONCURRENTLY

Another golden rule of PostgreSQL is: Always create your indexes concurrently.

Creating an index on a large dataset can take hours or even days, and the regular CREATE INDEX command blocks all writes for the duration of the command. While it doesn’t block SELECTs, this is still pretty bad and there’s a better way: CREATE INDEX CONCURRENTLY.

Don’t do this:

-- blocks all writes
CREATE INDEX items_value_idx ON items USING GIN (value jsonb_path_ops);

Instead, do this:

-- only blocks other DDL
CREATE INDEX CONCURRENTLY items_value_idx ON items USING GIN (value jsonb_path_ops);

Creating an index concurrently does have a downside. If something goes wrong it does not roll back and leaves an unfinished ("invalid") index behind. If that happens, don’t worry, simply run DROP INDEX CONCURRENTLY items_value_idx and try to create it again.

4: Take aggressive locks as late as possible

When you need to run a command that acquires aggressive locks on a table, try to do it as late in the transaction as possible to allow queries to continue for as long as possible.

For example, if you want to completely replace the contents of a table. Don’t do this:

BEGIN;
-- reads and writes blocked from here:
TRUNCATE items;
-- long-running operation:
\COPY items FROM 'newdata.csv' WITH CSV 
COMMIT;

Instead, load the data into a new table and then replace the old table:

BEGIN;
CREATE TABLE items_new (LIKE items INCLUDING ALL);
-- long-running operation:
\COPY items_new FROM 'newdata.csv' WITH CSV
-- reads and writes blocked from here:
DROP TABLE items;
ALTER TABLE items_new RENAME TO items;
COMMIT;

There is one problem, we didn’t block writes from the start, and the old items table might have changed by the time we drop it. To prevent that, we can explicitly take a lock the table that blocks writes, but not reads:

BEGIN;
LOCK items IN EXCLUSIVE MODE;
...

Sometimes it’s better to take locking into your own hands.

5: Adding a primary key with minimal locking

It’s often a good idea to add a primary key to your tables. For example, when you want to use logical replication or migrate your database using Citus Warp.

Postgres makes it very easy to create a primary key using ALTER TABLE, but while the index for the primary key is being built, which can take a long time if the table is large, all queries will be blocked.

ALTER TABLE items ADD PRIMARY KEY (id); -- blocks queries for a long time

Fortunately, you can first do all the heavy lifting using CREATE UNIQUE INDEX CONCURRENTLY, and then use the unique index as a primary key, which is a fast operation.

CREATE UNIQUE INDEX CONCURRENTLY items_pk ON items (id); -- takes a long time, but doesn’t block queries
ALTER TABLE items ADD CONSTRAINT items_pk PRIMARY KEY USING INDEX items_pk;  -- blocks queries, but only very briefly

By breaking down primary key creation into two steps, it has almost not impact on the user.

6: Never VACUUM FULL

The postgres user experience can be a little surprising sometimes. While VACUUM FULL sounds like something you want to do clear the dust of your database, a more appropriate command would have been:

PLEASE FREEZE MY DATABASE FOR HOURS;

VACUUM FULL rewrites the entire table to disk, which can take hours or days, and blocks all queries while doing it. While there some valid use cases for VACUUM FULL, such as a table that used to be big, but is now small and still takes up a lot of space, it is probably not your use case.

While you should aim to tune your autovacuum settings and use indexes to make your queries fast, you may occasionally want to run VACUUM, but NOT VACUUM FULL.

7: Avoid deadlocks by ordering commands

If you’ve been using PostgreSQL for a while, chances are you’ve seen errors like:

ERROR:  deadlock detected
DETAIL:  Process 13661 waits for ShareLock on transaction 45942; blocked by process 13483.
Process 13483 waits for ShareLock on transaction 45937; blocked by process 13661.

This happens when concurrent transactions take the same locks in a different order. For example, one transaction issues the following commands.

BEGIN;
UPDATE items SET counter = counter + 1 WHERE key = 'hello'; -- grabs lock on hello
UPDATE items SET counter = counter + 1 WHERE key = 'world'; -- blocks waiting for world
END;

Simultaneously, another transaction might be issuing the same commands, but in a different order.

BEGIN
UPDATE items SET counter = counter + 1 WHERE key = 'world'; -- grabs lock on world
UPDATE items SET counter = counter + 1 WHERE key = 'hello';  -- blocks waiting for hello
END;

If these transaction blocks run simultaneously, chances are that they get stuck waiting for each other and would never finish. Postgres will recognise this situation after a second or so and will cancel one of the transactions to let the other one finish. When this happen, you should take a look at your application to see if you can make your transactions always follow the same order. If both transactions first modify hello, then world, then the first transaction will block the second one on the hello lock before it can grab any other locks.

Share your tips!

We hope you found these tips helpful. If you have some other tips, feel free to tweet them @citusdata or on our active community of Citus users on Slack.

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.