Marco Slot

Making PostgreSQL tick: New features in pg_cron

Written byBy Marco Slot | October 26, 2023Oct 26, 2023

pg_cron is an open source PostgreSQL extension that provides a cron-based scheduler to periodically run SQL commands. Almost every managed PostgreSQL service supports pg_cron and it has become a standard tool for many PostgreSQL users. Since Citus has been my full-time job, pg_cron has always been a side project for me, and so I tried to architect it for simplicity, reliability, and low maintenance. Of course, with many users there is a long list of feature requests, and with the help of the Postgres community pg_cron keeps becoming more and more capable over time.

We recently added PostgreSQL 16 support (in version 1.6), but perhaps the most exciting feature added to pg_cron in the past year (in version 1.5) is the ability to schedule a job every few seconds. I shunned this feature idea for a while, because (a) it is not something regular cron can do; and (b) any issue in pg_cron would get much more severe if it were to happen every few seconds. However, by now pg_cron is reasonably battle-tested and second-granularity jobs had become the most popular pg_cron feature request by far.

The latest episode of Path To Citus Con—the monthly podcast for developers who love Postgres—is now out. This episode featured guests Paul Ramsey and Regina Obe on the topic “Why people care about PostGIS and Postgres”.

The conversation was all about PostGIS, a geospatial extension to Postgres which just happens to be one the most popular Postgres extensions. This episode was fairly technical, but still fascinating. The discussion ranged all the way from cartesian math at one point to how it’s very difficult to construct a database these days without a location component. This episode of Path To Citus Con focuses on the geospatial world of Postgres and shows how “where” is one of the fundamental things we all want to know about.

In this post, you’ll get a bit of backstory on the topic and the guests—both with a long history with PostGIS—of this episode of Path To Citus Con; and you’ll get a peek at key moments from this show, including the extensibility of Postgres demonstrated by PostGIS, “where” as the universal foreign key, and more. At the end of the post, you’ll find links of where you can listen to this and every episode of the podcast. We hope you love these “human side of Postgres” podcast episodes.

Claire Giordano

What’s new with Postgres at Microsoft (August 2023)

Written byBy Claire Giordano | August 31, 2023Aug 31, 2023

On one of the Postgres community chat forums, a friend asked me: "Is there a blog post that outlines all the work that is being done on Postgres at Microsoft? It's hard to keep track these days."

And my friend is right: it is hard to keep track. Probably because there are multiple Postgres workstreams at Microsoft, spread across a few different teams.

In this post, you'll get a bird's eye view of all the Postgres work the Microsoft team has done over the last year. Our work includes some pretty significant improvements to the Postgres managed services on Azure, as well as contributions across the entire open source ecosystem—including commits to the Postgres core; new releases to Postgres open source extensions like Citus and pg_cron; plus ecosystem work on Patroni, PgBouncer, pgcopydb. And more.

The topic of this month's PGSQL Phriday #011 community blogging event is partitioning vs. sharding in PostgreSQL. It seemed right to share a perspective on the question of "partitioning vs. sharding" from someone in the Citus open source team, since we eat, sleep, and breathe sharding for Postgres.

Postgres built-in "native" partitioning—and sharding via PG extensions like Citus—are both tools to grow your Postgres database, scale your application, and improve your application's performance.

What is partitioning and what is sharding? In Postgres, database partitioning and sharding are techniques for splitting collections of data into smaller sets, so the database only needs to process smaller chunks of data at a time. And as you might imagine, work gets done faster when you're processing less data.

In this post, you'll learn what partitioning and sharding are, why they matter, and when to use them. The table of contents:

Postgres community released a new feature, in Postgres 15.0, that performs actions to modify rows in the target table, using the data from a source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using INSERT with ON CONFLICT clause etc.

In this blog post, you will learn a high-level overview of the functioning of Postgres MERGE. It will delve into some of the practical use-cases, and subsequently elaborate on the different strategies employed by Citus for handling MERGE in a distributed environment.

Marco Slot

Citus 12: Schema-based sharding for PostgreSQL

Written byBy Marco Slot | July 18, 2023Jul 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps?

Our latest Citus open source release, Citus 12, adds a new and easy way to transparently scale your Postgres database: Schema-based sharding, where the database is transparently sharded by schema name.

Schema-based sharding gives an easy path for scaling out several important classes of applications that can divide their data across schemas:

  • Multi-tenant SaaS applications
  • Microservices that use the same database
  • Vertical partitioning by groups of tables

Each of these scenarios can now be enabled on Citus using regular CREATE SCHEMA commands. That way, many existing applications and libraries (e.g. django-tenants) can scale out without any changes, and developing new applications can be much easier. Moreover, you keep all the other benefits of Citus, including distributed transactions, reference tables, rebalancing, and more.

Distributed PostgreSQL has become a hot topic. Several distributed database vendors have added support for the PostgreSQL protocol as a convenient way to gain access to the PostgreSQL ecosystem. Others (like us) have built a distributed database on top of PostgreSQL itself.

For the Citus database team, distributed PostgreSQL is primarily about achieving high performance at scale. The unique thing about Citus, the technology powering Azure Cosmos DB for PostgreSQL, is that it is fully implemented as an open-source extension to PostgreSQL. It also leans entirely on PostgreSQL for storage, indexing, low-level query planning and execution, and various performance features. As such, Citus inherits the performance characteristics of a single PostgreSQL server but applies them at scale.

That all sounds good in theory, but to see whether this holds up in practice, you need benchmark numbers. We therefore asked GigaOM to run performance benchmarks comparing Azure Cosmos DB for PostgreSQL to other distributed implementations. GigaOM compared the transaction performance and price-performance of these popular managed services of distributed PostgreSQL, using the HammerDB benchmark software:

As you may have heard, we recently made PostgreSQL 15 generally available in Azure Cosmos DB for PostgreSQL within just 1 week of the PostgreSQL 15 release. The Postgres 15 version is available for you whether you need to create a new cluster in Azure Cosmos DB for PostgreSQL, or upgrade your existing cluster. (Note: you can do in-place major version upgrades in Azure Cosmos DB for PostgreSQL.) And the PostgreSQL 15 support is available in all Azure regions that support Azure Cosmos DB for PostgreSQL.

You may be surprised since it's usually not the norm for a managed database service to start supporting the new major PostgreSQL version that early... This post will walk you through what's going on behind the scenes that enables us to do such a feat. Some background before diving in:

Azure Cosmos DB for PostgreSQL is powered by native Postgres and Citus open source—and enables you to run PostgreSQL at any scale, from a single node to a large, distributed cluster. Customers can also scale out as much as they want depending on their needs with many additional features. The Hyperscale (Citus) managed service recently moved into Azure Cosmos DB family (more info on the launch of Azure Cosmos DB for PostgreSQL in this blog post) and with that introduced try Azure Cosmos DB for PostgreSQL for free where you can try out PostgreSQL 15 with Citus 11.1.

Gurkan Indibay

Tips for installing Citus and Postgres packages

Written byBy Gürkan İndibay | January 22, 2022Jan 22, 2022

Citus is a great extension for scaling out Postgres databases horizontally. You can use Citus either on the cloud on Azure or you can download Citus open source and install it wherever. In this blog post, we will focus on Citus open source packaging and installation.

When you go to the Citus download page to download the Citus packages—or you visit the Citus open source docs—many of you jump straight to the install instructions and the particular OS you’re looking for. That way, you can get straight to sharding Postgres with Citus.

But what if you want to see which operating systems the Citus packages support? Or what if you want to install Citus with an older version of Postgres?

This post will answer these types of nitty-gritty questions about Citus packages and their usages. Specifically, this post will cover these questions:

