Evolving pg_cron together: Postgres 13, audit log, background workers, & job names

Written by Marco Slot
October 31, 2020

This post by Marco Slot about the open source pg_cron extension to Postgres was originally published on the Azure Database for PostgreSQL Blog on Microsoft TechCommunity.

Update in October 2022: Citus has a new home on Azure! The Citus database is now available as a managed service in the cloud as Azure Cosmos DB for PostgreSQL. Azure documentation links have been updated throughout the post, to point to the new Azure docs.

One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.

Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:

-- vacuum my table every night at 3am (GMT)
SELECT cron.schedule('0 3 * * *', 'VACUUM my_table');

-- call a procedure every minute
SELECT cron.schedule('process-new-events', '* * * * *', 'CALL process_new_events()');

-- upgrade an extension when PostgreSQL restarts
SELECT cron.schedule('upgrade-pgcron', '@reboot', 'ALTER EXTENSION pg_cron UPDATE');

-- Delete old data on Saturday at 1:30am (GMT)
SELECT cron.schedule('delete-old-events','30 1 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);

Since its initial release, pg_cron has become a standard tool in the arsenal for many of you who use PostgreSQL. You can find the pg_cron extension on GitHub—it's open source, and it's easy to install and set up. And pg_cron is also included in managed Postgres database services such as the Hyperscale (Citus) and Flexible Server options in Azure Database for PostgreSQL.

The popularity of pg_cron has also created demand for more advanced features, such as an audit log, the ability to update cron job schedules, and for other ways to run pg_cron in cloud database services.

Recently, the RDS team at Amazon reached out to us to see if would accept contributions to pg_cron to enable pg_cron in RDS. And our answer was: Of course! Hence the pg_cron 1.3 release is the result of a collaboration between Microsoft (job names, reviews, bug fixes) and Amazon (audit log, background workers), with a little help from Zalando (PostgreSQL 13 support).

Audit log lets you see your running cron jobs, as well as past job runs

Pg_cron logs the outcome of jobs in the PostgreSQL log, but the log is not always easy to access for database users. Thanks to an awesome contribution by Bertrand Drouvot and Nathan Bossart from the Amazon RDS team, you can now also see a log of your pg_cron jobs in the database in a table called cron.job_run_details.

The cron.job_run_details table shows:

  • when a command started and finished,
  • whether the pg_cron command succeeded, and
  • the number of rows returned—or the error message to quickly detect when something went wrong
-- update my rollup tables every minute
SELECT cron.schedule('update rollups', '* * * * *', 'SELECT update_rollup_tables()');

-- after a while
SELECT * FROM cron.job_run_details;
┌───────┬───────┬─────────┬──────────┬──────────┬───────────────────────────────┬───────────┬────────────────────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
 jobid  runid  job_pid  database  username              command              status                  return_message                         start_time                      end_time           
├───────┼───────┼─────────┼──────────┼──────────┼───────────────────────────────┼───────────┼────────────────────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
     2     13     1022  postgres     marco  SELECT update_rollup_tables()  succeeded  1 row                                       2020-10-28 12:18:00.059566+02  2020-10-09 12:18:01.734262+02 
     2      1      986  postgres     marco  SELECT update_rollup_tables()  succeeded  1 row                                       2020-10-28 12:19:00.029769+02  2020-10-09 12:09:01.345637+02 
     1     14     1023  postgres     marco  VACUUM my_table                failed     ERROR: relation "my_table" does not exist   2020-10-28 12:19:00.079291+02  2020-10-28 12:19:00.080133+02 
└───────┴───────┴─────────┴──────────┴──────────┴───────────────────────────────┴───────────┴────────────────────────────────────────────┴───────────────────────────────┴───────────────────────────────┘
(3 rows)

The audit log is not automatically cleaned up, but … we have pg_cron! That means you can easily decide on your own retention policy and use pg_cron to schedule a job that cleans up the audit log:

-- at midnight, delete all audit log entries older than 14 days
SELECT cron.schedule('clean audit log', '0 0 * * *', $$DELETE FROM cron.job_run_details WHERE end_time < now()  interval '14 days'$$);

In implementing the audit log in pg_cron, we used the row-level security feature in Postgres to make sure that database users can only see and delete their own audit log entries.

Using background workers avoids connection configuration

Pg_cron normally executes commands by either connecting to localhost or the unix domain socket (configurable using the cron.host setting). Both connection options are secure but do require some changes to pg_hba.conf to enable, and which change you need to make depends on which Linux distribution you are using.

Thanks to another contribution by Bertrand and Nathan, you can choose to use dynamic background workers instead of connections by setting cron.use_background_workers = on in postgresql.conf. That way, you do not need any pg_hba.conf changes.

A slight downside of background workers is that the number of concurrent jobs is limited to max_worker_processes (8 by default). The standard way of connecting to localhost is only limited to max_connections (which is 100 by default, usually higher). We recommend increasing max_worker_processes when using background workers.

Job names simplify pg_cron job management

If you are already familiar with pg_cron, you may have noticed that the examples above now include a job name as the first argument to the cron.schedule(..) function. The ability to give your pg_cron jobs a name is also a new feature in pg_cron 1.3, by yours truly.

When you use a job name, cron.schedule does an upsert rather than an insert, which means you idempotently set your schedule and your command.

For instance, if I name my pg_cron job nightly vacuum then I can easily change the schedule:

-- set up a nightly vacuum at 1am
select cron.schedule('nightly vacuum', '0 1 * * *', 'vacuum my_table');
┌──────────┐
 schedule 
├──────────┤
        4 
└──────────┘
(1 row)

-- I can now see the nightly vacuum job in the cron.job.table
table cron.job;
┌───────┬───────────┬─────────────────┬───────────┬──────────┬──────────┬──────────┬────────┬────────────────┐
 jobid   schedule      command       nodename  nodeport  database  username  active     jobname     
├───────┼───────────┼─────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼────────────────┤
     4  0 1 * * *  vacuum my_table  localhost    5432    postgres   marco         t  nightly vacuum 
└───────┴───────────┴─────────────────┴───────────┴──────────┴──────────┴──────────┴────────┴────────────────┘
(1 row)

-- change the schedule to 3am by using the same job name
select cron.schedule('nightly vacuum', '0 3 * * *', 'vacuum my_table');
┌──────────┐
 schedule 
├──────────┤
        4 
└──────────┘
(1 row)

-- I can see the new schedule in the cron.job table
table cron.job;
┌───────┬───────────┬─────────────────┬───────────┬──────────┬──────────┬──────────┬────────┬────────────────┐
 jobid   schedule      command       nodename  nodeport  database  username  active     jobname     
├───────┼───────────┼─────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼────────────────┤
     4  0 3 * * *  vacuum my_table  localhost    5432    postgres   marco         t  nightly vacuum 
└───────┴───────────┴─────────────────┴───────────┴──────────┴──────────┴──────────┴────────┴────────────────┘
(1 row)

Using a job name also makes it easy to ensure a specific set of jobs is running, without having to check whether the job already exists.

PostgreSQL 13 support is available for pg_cron

Finally, PostgreSQL 13 is out! Of course, the new pg_cron 1.3 release supports PostgreSQL 13 (and all PostgreSQL versions since 9.5). Big thanks to Alexander Kukushkin from Zalando for updating pg_cron to support Postgres 13.

Packages for pg_cron 1.3 are available via PGDG for Red Hat/CentOS and Debian/Ubuntu.

Better together: Improving PostgreSQL across competitive boundaries

One of the exciting parts of being on the Postgres team at Microsoft is that Microsoft has embraced open source. What that means in practice is that we are making significant contributions to PostgreSQL via our team of PostgreSQL committers and contributors (including engineers Andres Freund, David Rowley, Dimitri Fontaine, Jeff Davis, and Thomas Munro). In addition, we have created and maintain some pretty useful open source extensions to Postgres such as Citus, pg_auto_failover, and of course pg_cron. When we open source our software, competitors might benefit from our work, but the more important point is all PostgreSQL users (including our customers) benefit!

I find pg_cron to be super useful. And some have gone so far to say that it's "really cool" to have the scheduler within the database, and tied to the data itself. We're happy to see that pg_cron has gotten significant contributions from Amazon and look forward to seeing one great PostgreSQL job scheduler across Azure Database for PostgreSQL, RDS, and other Postgres cloud database services.

Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus