Running periodic jobs such as vacuuming or removing old data is a common requirement in PostgreSQL. A simple way to achieve this is to configure cron or another external daemon to periodically connect to the database and run a command. However, with databases increasingly running as managed services or stand-alone containers, running and configuring a separate deamon is often impractical. It is also hard to make your cron jobs aware of fail-over or to schedule jobs across all nodes in a Citus cluster.
pg_cron is a simple, cron-based job scheduler for PostgreSQL that runs inside the database as an extension. A background worker initiates commands according to their schedule by connecting to the local database as the user that scheduled the job. Because we reused parts of the cron source code by Paul Vixie, you can use the same syntax to express the schedule:
-- Delete old data on Saturday at 3:30am (GMT) SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$); -- Vacuum every day at 10:00am (GMT) SELECT cron.schedule('0 10 * * *', 'VACUUM'); -- Unschedule all tasks SELECT cron.unschedule(jobid) FROM cron.job;
pg_cron can run multiple jobs in parallel, but it runs at most one instance of a job at a time. If a second run is supposed to start before the first one finishes, then the second run is queued and started as soon as the first run completes. This ensures that jobs run exactly as many times as scheduled and don’t run concurrently with themselves.
If you set up pg_cron on a hot standby, then it will start running the cron jobs, which are stored in a table and thus replicated to the hot standby, as soon as the server is promoted. This means your periodic jobs automatically fail over with your PostgreSQL server.
We expect that pg_cron will make periodic jobs on PostgreSQL a lot easier to manage. If you find any issues please let us know via the pg_cron GitHub page.