POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Marco Slot
October 31, 2020
This post by Marco Slot about the open source pg_cron extension to Postgres was originally published on the Azure Database for PostgreSQL Blog on Microsoft TechCommunity.
Update in October 2022: Citus has a new home on Azure! The Citus database is now available as a managed service in the cloud as Azure Cosmos DB for PostgreSQL. Azure documentation links have been updated throughout the post, to point to the new Azure docs.
One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.
Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:
-- vacuum my table every night at 3am (GMT)
SELECT cron.schedule('0 3 * * *', 'VACUUM my_table');
-- call a procedure every minute
SELECT cron.schedule('process-new-events', '* * * * *', 'CALL process_new_events()');
-- upgrade an extension when PostgreSQL restarts
SELECT cron.schedule('upgrade-pgcron', '@reboot', 'ALTER EXTENSION pg_cron UPDATE');
-- Delete old data on Saturday at 1:30am (GMT)
SELECT cron.schedule('delete-old-events','30 1 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);
Since its initial release, pg_cron has become a standard tool in the arsenal for many of you who use PostgreSQL. You can find the pg_cron extension on GitHub—it's open source, and it's easy to install and set up. And pg_cron is also included in managed Postgres database services such as the Hyperscale (Citus) and Flexible Server options in Azure Database for PostgreSQL.
The popularity of pg_cron has also created demand for more advanced features, such as an audit log, the ability to update cron job schedules, and for other ways to run pg_cron in cloud database services.
Recently, the RDS team at Amazon reached out to us to see if would accept contributions to pg_cron to enable pg_cron in RDS. And our answer was: Of course! Hence the pg_cron 1.3 release is the result of a collaboration between Microsoft (job names, reviews, bug fixes) and Amazon (audit log, background workers), with a little help from Zalando (PostgreSQL 13 support).
Pg_cron logs the outcome of jobs in the PostgreSQL log, but the log is not always easy to access for database users. Thanks to an awesome contribution by Bertrand Drouvot and Nathan Bossart from the Amazon RDS team, you can now also see a log of your pg_cron jobs in the database in a table called cron.job_run_details
.
The cron.job_run_details
table shows:
-- update my rollup tables every minute
SELECT cron.schedule('update rollups', '* * * * *', 'SELECT update_rollup_tables()');
-- after a while
SELECT * FROM cron.job_run_details;
┌───────┬───────┬─────────┬──────────┬──────────┬───────────────────────────────┬───────────┬────────────────────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
│ jobid │ runid │ job_pid │ database │ username │ command │ status │ return_message │ start_time │ end_time │
├───────┼───────┼─────────┼──────────┼──────────┼───────────────────────────────┼───────────┼────────────────────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
│ 2 │ 13 │ 1022 │ postgres │ marco │ SELECT update_rollup_tables() │ succeeded │ 1 row │ 2020-10-28 12:18:00.059566+02 │ 2020-10-09 12:18:01.734262+02 │
│ 2 │ 1 │ 986 │ postgres │ marco │ SELECT update_rollup_tables() │ succeeded │ 1 row │ 2020-10-28 12:19:00.029769+02 │ 2020-10-09 12:09:01.345637+02 │
│ 1 │ 14 │ 1023 │ postgres │ marco │ VACUUM my_table │ failed │ ERROR: relation "my_table" does not exist │ 2020-10-28 12:19:00.079291+02 │ 2020-10-28 12:19:00.080133+02 │
└───────┴───────┴─────────┴──────────┴──────────┴───────────────────────────────┴───────────┴────────────────────────────────────────────┴───────────────────────────────┴───────────────────────────────┘
(3 rows)
The audit log is not automatically cleaned up, but … we have pg_cron! That means you can easily decide on your own retention policy and use pg_cron to schedule a job that cleans up the audit log:
-- at midnight, delete all audit log entries older than 14 days
SELECT cron.schedule('clean audit log', '0 0 * * *', $$DELETE FROM cron.job_run_details WHERE end_time < now() – interval '14 days'$$);
In implementing the audit log in pg_cron, we used the row-level security feature in Postgres to make sure that database users can only see and delete their own audit log entries.
Pg_cron normally executes commands by either connecting to localhost or the unix domain socket (configurable using the cron.host
setting). Both connection options are secure but do require some changes to pg_hba.conf to enable, and which change you need to make depends on which Linux distribution you are using.
Thanks to another contribution by Bertrand and Nathan, you can choose to use dynamic background workers instead of connections by setting cron.use_background_workers = on
in postgresql.conf. That way, you do not need any pg_hba.conf changes.
A slight downside of background workers is that the number of concurrent jobs is limited to max_worker_processes
(8 by default). The standard way of connecting to localhost is only limited to max_connections
(which is 100 by default, usually higher). We recommend increasing max_worker_processes
when using background workers.
If you are already familiar with pg_cron, you may have noticed that the examples above now include a job name as the first argument to the cron.schedule(..)
function. The ability to give your pg_cron jobs a name is also a new feature in pg_cron 1.3, by yours truly.
When you use a job name, cron.schedule
does an upsert rather than an insert, which means you idempotently set your schedule and your command.
For instance, if I name my pg_cron job nightly vacuum
then I can easily change the schedule:
-- set up a nightly vacuum at 1am
select cron.schedule('nightly vacuum', '0 1 * * *', 'vacuum my_table');
┌──────────┐
│ schedule │
├──────────┤
│ 4 │
└──────────┘
(1 row)
-- I can now see the nightly vacuum job in the cron.job.table
table cron.job;
┌───────┬───────────┬─────────────────┬───────────┬──────────┬──────────┬──────────┬────────┬────────────────┐
│ jobid │ schedule │ command │ nodename │ nodeport │ database │ username │ active │ jobname │
├───────┼───────────┼─────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼────────────────┤
│ 4 │ 0 1 * * * │ vacuum my_table │ localhost │ 5432 │ postgres │ marco │ t │ nightly vacuum │
└───────┴───────────┴─────────────────┴───────────┴──────────┴──────────┴──────────┴────────┴────────────────┘
(1 row)
-- change the schedule to 3am by using the same job name
select cron.schedule('nightly vacuum', '0 3 * * *', 'vacuum my_table');
┌──────────┐
│ schedule │
├──────────┤
│ 4 │
└──────────┘
(1 row)
-- I can see the new schedule in the cron.job table
table cron.job;
┌───────┬───────────┬─────────────────┬───────────┬──────────┬──────────┬──────────┬────────┬────────────────┐
│ jobid │ schedule │ command │ nodename │ nodeport │ database │ username │ active │ jobname │
├───────┼───────────┼─────────────────┼───────────┼──────────┼──────────┼──────────┼────────┼────────────────┤
│ 4 │ 0 3 * * * │ vacuum my_table │ localhost │ 5432 │ postgres │ marco │ t │ nightly vacuum │
└───────┴───────────┴─────────────────┴───────────┴──────────┴──────────┴──────────┴────────┴────────────────┘
(1 row)
Using a job name also makes it easy to ensure a specific set of jobs is running, without having to check whether the job already exists.
Finally, PostgreSQL 13 is out! Of course, the new pg_cron 1.3 release supports PostgreSQL 13 (and all PostgreSQL versions since 9.5). Big thanks to Alexander Kukushkin from Zalando for updating pg_cron to support Postgres 13.
Packages for pg_cron 1.3 are available via PGDG for Red Hat/CentOS and Debian/Ubuntu.
One of the exciting parts of being on the Postgres team at Microsoft is that Microsoft has embraced open source. What that means in practice is that we are making significant contributions to PostgreSQL via our team of PostgreSQL committers and contributors (including engineers Andres Freund, David Rowley, Dimitri Fontaine, Jeff Davis, and Thomas Munro). In addition, we have created and maintain some pretty useful open source extensions to Postgres such as Citus, pg_auto_failover, and of course pg_cron. When we open source our software, competitors might benefit from our work, but the more important point is all PostgreSQL users (including our customers) benefit!
I find pg_cron to be super useful. And some have gone so far to say that it's "really cool" to have the scheduler within the database, and tied to the data itself. We're happy to see that pg_cron has gotten significant contributions from Amazon and look forward to seeing one great PostgreSQL job scheduler across Azure Database for PostgreSQL, RDS, and other Postgres cloud database services.