Blog posts by Marco Slot on the Citus Blog

Making PostgreSQL tick: New features in pg_cron

Written by By Marco Slot | October 26, 2023 Oct 26, 2023

pg_cron is an open source PostgreSQL extension that provides a cron-based scheduler to periodically run SQL commands. Almost every managed PostgreSQL service supports pg_cron and it has become a standard tool for many PostgreSQL users. Since Citus has been my full-time job, pg_cron has always been a side project for me, and so I tried to architect it for simplicity, reliability, and low maintenance. Of course, with many users there is a long list of feature requests, and with the help of the Postgres community pg_cron keeps becoming more and more capable over time.

We recently added PostgreSQL 16 support (in version 1.6), but perhaps the most exciting feature added to pg_cron in the past year (in version 1.5) is the ability to schedule a job every few seconds. I shunned this feature idea for a while, because (a) it is not something regular cron can do; and (b) any issue in pg_cron would get much more severe if it were to happen every few seconds. However, by now pg_cron is reasonably battle-tested and second-granularity jobs had become the most popular pg_cron feature request by far.

Keep reading

Citus 12: Schema-based sharding for PostgreSQL

Written by By Marco Slot | July 18, 2023 Jul 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps?

Our latest Citus open source release, Citus 12, adds a new and easy way to transparently scale your Postgres database: Schema-based sharding, where the database is transparently sharded by schema name.

Schema-based sharding gives an easy path for scaling out several important classes of applications that can divide their data across schemas:

Multi-tenant SaaS applications
Microservices that use the same database
Vertical partitioning by groups of tables

Each of these scenarios can now be enabled on Citus using regular CREATE SCHEMA commands. That way, many existing applications and libraries (e.g. django-tenants) can scale out without any changes, and developing new applications can be much easier. Moreover, you keep all the other benefits of Citus, including distributed transactions, reference tables, rebalancing, and more.

Keep reading

Distributed PostgreSQL benchmarks using HammerDB, by GigaOM

Written by By Marco Slot | June 21, 2023 Jun 21, 2023

Distributed PostgreSQL has become a hot topic. Several distributed database vendors have added support for the PostgreSQL protocol as a convenient way to gain access to the PostgreSQL ecosystem. Others (like us) have built a distributed database on top of PostgreSQL itself.

For the Citus database team, distributed PostgreSQL is primarily about achieving high performance at scale. The unique thing about Citus, the technology powering Azure Cosmos DB for PostgreSQL, is that it is fully implemented as an open-source extension to PostgreSQL. It also leans entirely on PostgreSQL for storage, indexing, low-level query planning and execution, and various performance features. As such, Citus inherits the performance characteristics of a single PostgreSQL server but applies them at scale.

That all sounds good in theory, but to see whether this holds up in practice, you need benchmark numbers. We therefore asked GigaOM to run performance benchmarks comparing Azure Cosmos DB for PostgreSQL to other distributed implementations. GigaOM compared the transaction performance and price-performance of these popular managed services of distributed PostgreSQL, using the HammerDB benchmark software:

Keep reading

What’s new in Citus 11.3 & Postgres for multi-tenant SaaS workloads

Written by By Marco Slot | May 5, 2023 May 5, 2023

Citus enables several different PostgreSQL use cases, but one of the most popular ones is to build scalable multi-tenant software as a service (SaaS) applications. The most common way to build a multi-tenant application on Citus is to distribute all your Postgres tables by a “tenant ID” column. That way rows are (hash-)distributed across nodes, while rows with the same tenant ID value are co-located on the same node for fast local joins, transactions, and foreign keys.

For those of you who build SaaS apps, one question many of you have is how active your tenants are. More specifically: What are your busiest tenants? How many queries is your application doing on behalf of your tenants, and how much CPU do those queries use?

The new 11.3 release to the open source Citus database extension gives you tenant monitoring—with instant visibility into your top tenants using the new citus_stat_tenants feature, which shows query counts and CPU usage over a configurable time period.

Keep reading

What’s new in Citus 11.2 for Postgres, plus Patroni HA support for Citus

Written by By Marco Slot | February 8, 2023 Feb 8, 2023

Our goal for the Citus extension is for you to be able to use all PostgreSQL features at any scale, with a seamless scaling experience. Distributed tables (or more generally “Citus tables”) are a powerful tool to get high performance at any scale. There are only a few remaining limitations when distributing a PostgreSQL table, but we are determined to solve them all. The Citus 11.2 release checks off another five SQL & DDL features that now work seamlessly on Citus tables. We also improved progress tracking for the shard rebalancer, so you know exactly what’s going on when rebalancing your cluster.

We also want PostgreSQL tools to work out-of-the-box even if you have a distributed PostgreSQL cluster. One of the most frequent questions we get on the Citus Slack from our open source users is how to set up high availability. Alexander Kukushkin, who is the primary maintainer of Patroni and recently joined the Citus database engine team, therefore developed a new version of Patroni which includes support for Citus!

Before we dive in, you can find detailed release notes for Citus 11.2 by the engineering team on our Updates page.

Keep reading

Citus 11.1 shards your Postgres tables without interruption

Written by By Marco Slot | September 19, 2022 Sep 19, 2022

Citus is a distributed database that is built entirely as an open source PostgreSQL extension. In fact, you can install it in your PostgreSQL server without changing any PostgreSQL functionality. Citus simply gives PostgreSQL additional superpowers.

Being an extension also means we can keep adding new Postgres superpowers at a high pace. In the last release (11.0), we focused on giving you the ability to query from any node, opening up Citus for many new use cases, and we also made Citus fully open source. That means you can see everything we do on the Citus GitHub page (and star the repo if you’re a fan 😊). It also means that everyone can take advantage of shard rebalancing without write-downtime.

In the latest release (11.1), our Citus database team at Microsoft improved the application’s experience and avoided blocking writes during important operations like distributing tables and tenant isolation. These new capabilities built on the experience gained from developing the shard rebalancer, which uses logical replication to avoid blocking writes. In addition, we made the shard rebalancer faster and more user-friendly; also, we prepared for the upcoming PostgreSQL 15 release. This post gives you a quick tour of the major changes in Citus 11.1, including:

Keep reading

Citus 11 for Postgres goes fully open source, with query from any node

Written by By Marco Slot | June 17, 2022 Jun 17, 2022

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

Keep reading

Test drive the Citus 11.0 beta for Postgres

Written by By Marco Slot | March 26, 2022 Mar 26, 2022

Today we released Citus 11.0 beta, which is our first ever beta release of the Citus open source extension to Postgres. The reason we are releasing a beta version of 11.0 is that we are introducing a few fundamentally new capabilities, and we would like to get feedback from those of you who use Citus before we release Citus 11.0 to the world.

The biggest change in Citus 11.0 beta is that the schema and Citus metadata are now automatically synchronized throughout the database cluster. That means you can always query distributed tables from any node in a Citus cluster!

The easiest way to use Citus is to connect to the coordinator node and use it for both schema changes and distributed queries, but for very demanding applications, you now have the option to load balance distributed queries across the worker nodes in (parts of) your application by using a different connection string and factoring a few limitations.

Keep reading

Citus Talk at CMU: Distributed PostgreSQL as an Extension

Written by By Marco Slot | April 10, 2021 Apr 10, 2021

Last month we released Citus 10 and we've received an overwhelming amount of positive feedback on the new columnar compression and single node Citus features, as well as the news that we’ve open sourced the shard rebalancer.

The new and exciting Citus 10 features are bringing in lots of new users of Citus open source and the Citus database service on Azure. And many of you are asking:

Keep reading

Citus 10: Columnar for Postgres, rebalancer, single-node, & more

Written by By Marco Slot | March 5, 2021 Mar 5, 2021

Development on Citus first started around a decade ago and once a year we release a major new Citus open source version. We wanted to make number 10 something special, but I could not have imagined how truly spectacular this release would become. Citus 10 extends Postgres (12 and 13) with many new superpowers:

Columnar storage for Postgres: Compress your PostgreSQL and Citus tables to reduce storage cost and speed up your analytical queries.
Sharding on a single Citus node: Make your single-node Postgres server ready to scale out by sharding tables locally using Citus.
Shard rebalancer in Citus open source: We have open sourced the shard rebalancer so you can easily add Citus nodes and rebalance your cluster.
Joins and foreign keys between local PostgreSQL tables and Citus tables: Mix and match PostgreSQL and Citus tables with foreign keys and joins.
Functions to change the way your tables are distributed: Redistribute your tables in a single step using new alter table functions.
Much more: Better naming, improved SQL & DDL support, simplified operations.

These new capabilities represent a fundamental shift in what Citus is and what Citus can do for you.

Keep reading