pg_cron: Run periodic jobs in PostgreSQL

Marco Slot Sep 9, 2016

About Citus

Citus is a distributed database that scales out PostgreSQL. An extension to Postgres, Citus is available as open source, as on-prem software, and as a fully-managed database as a service.

Sign up for our newsletter

Enjoy what you're reading? Sign-up to our newsletter to stay informed:

Other Recent Posts

Introducing WAL-G: Faster Disaster Recovery for Postgres Principles of Sharding for Relational Databases Fork your distributed Postgres database with Citus More Articles

Like our blog, or have a question about Citus? Join us on Slack for a chat :)

Running periodic jobs such as vacuuming or removing old data is a common requirement in PostgreSQL. A simple way to achieve this is to configure cron or another external daemon to periodically connect to the database and run a command. However, with databases increasingly running as managed services or stand-alone containers, running and configuring a separate deamon is often impractical. It is also hard to make your cron jobs aware of fail-over or to schedule jobs across all nodes in a Citus cluster.

pg_cron is a simple, cron-based job scheduler for PostgreSQL that runs inside the database as an extension. A background worker initiates commands according to their schedule by connecting to the local database as the user that scheduled the job. Because we reused parts of the cron source code by Paul Vixie, you can use the same syntax to express the schedule:

-- Delete old data on Saturday at 3:30am (GMT)
SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);

-- Vacuum every day at 10:00am (GMT)
SELECT cron.schedule('0 10 * * *', 'VACUUM');

-- Unschedule all tasks
SELECT cron.unschedule(jobid) FROM cron.job;

pg_cron can run multiple jobs in parallel, but it runs at most one instance of a job at a time. If a second run is supposed to start before the first one finishes, then the second run is queued and started as soon as the first run completes. This ensures that jobs run exactly as many times as scheduled and don’t run concurrently with themselves.

If you set up pg_cron on a hot standby, then it will start running the cron jobs, which are stored in a table and thus replicated to the hot standby, as soon as the server is promoted. This means your periodic jobs automatically fail over with your PostgreSQL server.

We expect that pg_cron will make periodic jobs on PostgreSQL a lot easier to manage. If you find any issues please let us know via the pg_cron GitHub page.

← Next article Previous article →