Citus Blog

Articles tagged: popular

Marco Slot

Making PostgreSQL tick: New features in pg_cron

Written byBy Marco Slot | October 26, 2023Oct 26, 2023

pg_cron is an open source PostgreSQL extension that provides a cron-based scheduler to periodically run SQL commands. Almost every managed PostgreSQL service supports pg_cron and it has become a standard tool for many PostgreSQL users. Since Citus has been my full-time job, pg_cron has always been a side project for me, and so I tried to architect it for simplicity, reliability, and low maintenance. Of course, with many users there is a long list of feature requests, and with the help of the Postgres community pg_cron keeps becoming more and more capable over time.

We recently added PostgreSQL 16 support (in version 1.6), but perhaps the most exciting feature added to pg_cron in the past year (in version 1.5) is the ability to schedule a job every few seconds. I shunned this feature idea for a while, because (a) it is not something regular cron can do; and (b) any issue in pg_cron would get much more severe if it were to happen every few seconds. However, by now pg_cron is reasonably battle-tested and second-granularity jobs had become the most popular pg_cron feature request by far.

Keep reading

Big news in the Postgres world: PostgreSQL 16 was released just over 2 weeks ago. And today we're announcing that Postgres 16 is generally available for production workloads on Azure Cosmos DB for PostgreSQL. That's right, in production: this announcement is not just a preview of Postgres 16 support.

Whether you need to provision a new distributed Postgres cluster in Azure Cosmos DB for PostgreSQL—or upgrade your existing database clusters—Postgres 16 is now an option for you.

And you can use Azure Portal, Bicep or ARM templates, REST APIs, Azure SDKs, or Azure CLI to spin up a new Postgres 16 cluster in Azure Cosmos DB for PostgreSQL, or to upgrade an existing cluster to Postgres 16.

Keep reading

Postgres community released a new feature, in Postgres 15.0, that performs actions to modify rows in the target table, using the data from a source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using INSERT with ON CONFLICT clause etc.

In this blog post, you will learn a high-level overview of the functioning of Postgres MERGE. It will delve into some of the practical use-cases, and subsequently elaborate on the different strategies employed by Citus for handling MERGE in a distributed environment.

Keep reading
Marco Slot

Citus 12: Schema-based sharding for PostgreSQL

Written byBy Marco Slot | July 18, 2023Jul 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps?

Our latest Citus open source release, Citus 12, adds a new and easy way to transparently scale your Postgres database: Schema-based sharding, where the database is transparently sharded by schema name.

Schema-based sharding gives an easy path for scaling out several important classes of applications that can divide their data across schemas:

  • Multi-tenant SaaS applications
  • Microservices that use the same database
  • Vertical partitioning by groups of tables

Each of these scenarios can now be enabled on Citus using regular CREATE SCHEMA commands. That way, many existing applications and libraries (e.g. django-tenants) can scale out without any changes, and developing new applications can be much easier. Moreover, you keep all the other benefits of Citus, including distributed transactions, reference tables, rebalancing, and more.

Keep reading

Distributed PostgreSQL has become a hot topic. Several distributed database vendors have added support for the PostgreSQL protocol as a convenient way to gain access to the PostgreSQL ecosystem. Others (like us) have built a distributed database on top of PostgreSQL itself.

For the Citus database team, distributed PostgreSQL is primarily about achieving high performance at scale. The unique thing about Citus, the technology powering Azure Cosmos DB for PostgreSQL, is that it is fully implemented as an open-source extension to PostgreSQL. It also leans entirely on PostgreSQL for storage, indexing, low-level query planning and execution, and various performance features. As such, Citus inherits the performance characteristics of a single PostgreSQL server but applies them at scale.

That all sounds good in theory, but to see whether this holds up in practice, you need benchmark numbers. We therefore asked GigaOM to run performance benchmarks comparing Azure Cosmos DB for PostgreSQL to other distributed implementations. GigaOM compared the transaction performance and price-performance of these popular managed services of distributed PostgreSQL, using the HammerDB benchmark software:

Keep reading
Nik Larin

News: Postgres 15 available in Azure Cosmos DB for PostgreSQL

Written byBy Nik Larin | October 21, 2022Oct 21, 2022

Big news from the Postgres and Citus team here at Microsoft! Just 1 week after PostgreSQL 15 was released, PostgreSQL 15 GA is generally available in the portal for the Azure Cosmos DB for PostgreSQL managed service—in all Azure regions. Whether you need to provision new clusters in Azure Cosmos DB for Postgres—or upgrade your existing database clusters—Postgres 15 is now a choice for you. Oh, and you can upgrade your existing cluster to Postgres 15 from any of the other supported major Postgres versions, using the in-place major version upgrade feature.

Keep reading

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

Keep reading

Today, we are excited to announce PostgreSQL 14's General Availability (GA) on Azure's Hyperscale (Citus) option. To our knowledge, this is the first time a major cloud provider has announced GA for a new Postgres major version on their platform one day after the official release.

Starting today, you can deploy Postgres 14 in many Hyperscale (Citus) regions. In upcoming months, we will roll out Postgres 14 across more Azure regions and also release it with our new Flexible Server option in Azure Database for PostgreSQL.

This announcement helps us bring the latest in Postgres to Azure customers as new features become available. Further, it shows our commitment to open source PostgreSQL and its ecosystem. We choose to extend Postgres and share our contributions, instead of creating and managing a proprietary fork on the cloud.

In this blog post, you'll first get a glimpse into some of our favorite features in Postgres 14. These include connection scaling, faster VACUUM, and improvements to crash recovery times.

We'll then describe the work involved in making Postgres extensions compatible with new major Postgres versions, including our distributed database Citus as well as other extensions such as HyperLogLog (HLL), pg_cron, and TopN. Finally, you'll learn how packaging, testing, and deployments work on Hyperscale (Citus). This last part ties everything together and enables us to release new versions on Azure, with speed.

Keep reading

With the 10.1 release to the Citus extension to Postgres, you can now monitor the progress of an ongoing shard rebalance—plus you get performance optimizations, as well as some user experience improvements to the rebalancer, too.

Whether you use Citus open source to scale out Postgres, or you use Citus in the cloud, this post is your guide to what’s new with the shard rebalancer in Citus 10.1.

And if you’re wondering when you might need to use the shard rebalancer: the rebalancer is used when you add a new Postgres node to your existing Citus database cluster and you want to move some of the old data to this new node, to “balance” the cluster. There are also times you might want to balance shards across nodes in a Citus cluster in order to optimize performance. A common example of this is when you have a SaaS application and one of your customers/tenants has significant more activity than the rest.

Keep reading

One of the main reasons people use the Citus extension for Postgres is to distribute the data in Postgres tables across multiple nodes. Citus does this by splitting the original Postgres table into multiple smaller tables and putting these smaller tables on different nodes. The process of splitting bigger tables into smaller ones is called sharding—and these smaller Postgres tables are called “shards”. Citus then allows you to query the shards as if they were still a single Postgres table.

One of the big changes in Citus 10—in addition to adding columnar storage, and the new ability to shard Postgres on a single Citus node—is that we open sourced the shard rebalancer.

Yes, that’s right, we have open sourced the shard rebalancer! The Citus 10 shard rebalancer gives you an easy way to rebalance shards across your cluster and helps you avoid data hotspots over time. Let’s dig into the what and the how.

Keep reading

Page 1 of 4