Citus Blog

Articles tagged: performance

David Rowley

What’s new in the Postgres 16 query planner / optimizer

Written byBy David Rowley | February 8, 2024Feb 8, 2024

PostgreSQL 16 introduces quite a few improvements to the query planner and makes many SQL queries run faster than they did on previous versions of PostgreSQL.

If you look at the PG16 release notes, you’ll see some of these planner improvements. But with the volume of changes made in each PostgreSQL release, it’s not possible to provide enough detail about each and every change. So maybe you might need a bit more detail to know what the change is about—before you understand if it’s relevant to you.

In this blog post, assuming you’ve already got a handle on the basics of EXPLAIN, you’ll get a deep dive into the 10 improvements made in the PostgreSQL 16 query planner. For each of the improvements to the PG16 planner (the planner is often called an optimizer in other relational databases), you’ll also get comparisons between PG15 and PG16 planner output—plus examples of what changed, in the form of a self-contained test you can try for yourself.

Keep reading

Episode 11 of Path To Citus Con—the monthly podcast for developers who love Postgres—is now out. This episode featured guests Jelte Fennema-Nio and Marco Slot who joined us (along with co-hosts Claire Giordano and Pino de Candia) to talk about performance benchmarking, specifically benchmarking databases and Postgres.

The official title of Episode 11 with Jelte and Marco is “My Journey into Performance Benchmarking”.

I love hearing how people find themselves doing what they do, so it was fascinating to hear how they got started. One of the most interesting insights was the impact something like benchmarking can have on your career—even if you don’t particularly enjoy it!

Keep reading

Distributed PostgreSQL has become a hot topic. Several distributed database vendors have added support for the PostgreSQL protocol as a convenient way to gain access to the PostgreSQL ecosystem. Others (like us) have built a distributed database on top of PostgreSQL itself.

For the Citus database team, distributed PostgreSQL is primarily about achieving high performance at scale. The unique thing about Citus, the technology powering Azure Cosmos DB for PostgreSQL, is that it is fully implemented as an open-source extension to PostgreSQL. It also leans entirely on PostgreSQL for storage, indexing, low-level query planning and execution, and various performance features. As such, Citus inherits the performance characteristics of a single PostgreSQL server but applies them at scale.

That all sounds good in theory, but to see whether this holds up in practice, you need benchmark numbers. We therefore asked GigaOM to run performance benchmarks comparing Azure Cosmos DB for PostgreSQL to other distributed implementations. GigaOM compared the transaction performance and price-performance of these popular managed services of distributed PostgreSQL, using the HammerDB benchmark software:

Keep reading

One of the most important improvements in Citus 11.3 is that Citus offers more reliable metadata sync. Before 11.3, when a Citus cluster had thousands of distributed objects (such as distributed tables), Citus occasionally experienced memory problems while running metadata sync. Due to these memory errors, some users with very large numbers of tables were sometimes unable to add new nodes or upgrade beyond Citus 11.0.

To address the memory issues, we added an alternative "non-transactional" mode to the current metadata sync in Citus 11.3.

The default mode for metadata sync is still the original single transaction mode that we introduced in Citus 11.0. But now in 11.3 or later, if you have a very large number of tables and you run into the memory error, you can choose to optionally switch to the non-transactional mode, which syncs the metadata via many transactions. While most of you who use Citus will not need to enable this alternative metadata sync mode, this is how to do it:

Keep reading
Samay Sharma

Debugging Postgres autovacuum problems: 13 tips

Written byBy Samay Sharma | July 28, 2022Jul 28, 2022

If you've been running PostgreSQL for a while, you've heard about autovacuum. Yes, autovacuum, the thing which everybody asks you not to turn off, which is supposed to keep your database clean and reduce bloat automatically.

And yet—imagine this: one fine day, you see that your database size is larger than you expect, the I/O load on your database has increased, and things have slowed down without much change in workload. You begin looking into what might have happened. You run the excellent Postgres bloat query and you notice you have a lot of bloat. So you run the VACUUM command manually to clear the bloat in your Postgres database. Good!

But then you have to address the elephant in the room: why didn't Postgres autovacuum clean up the bloat in the first place...? Does the above story sound familiar? Well, you are not alone. 😊

Keep reading

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

Keep reading
David Rowley

Speeding up sort performance in Postgres 15

Written byBy David Rowley | May 19, 2022May 19, 2022

In recent years, PostgreSQL has seen several improvements which make sorting faster. In the PostgreSQL 15 development cycle—which ended in April 2022—Ronan Dunklau, Thomas Munro, Heikki Linnakangas, and I contributed some changes to PostgreSQL to make sorts go even faster.

Each of the improvements to sort should be available when PostgreSQL 15 is out in late 2022.

Why care about sort performance? When you run your application on PostgreSQL, there are several scenarios where PostgreSQL needs to sort records (aka rows) on your behalf. The main one is for ORDER BY queries. Sorting can also be used in:

  • Aggregate functions with an ORDER BY clause
  • GROUP BY queries
  • Queries with a plan containing a Merge Join
  • UNION queries
  • DISTINCT queries
  • Queries with window functions with a PARTITION BY and/or ORDER BY clause

If PostgreSQL is able to sort records faster, then queries using sort will run more quickly.

Keep reading
Marco Slot

Test drive the Citus 11.0 beta for Postgres

Written byBy Marco Slot | March 26, 2022Mar 26, 2022

Today we released Citus 11.0 beta, which is our first ever beta release of the Citus open source extension to Postgres. The reason we are releasing a beta version of 11.0 is that we are introducing a few fundamentally new capabilities, and we would like to get feedback from those of you who use Citus before we release Citus 11.0 to the world.

The biggest change in Citus 11.0 beta is that the schema and Citus metadata are now automatically synchronized throughout the database cluster. That means you can always query distributed tables from any node in a Citus cluster!

The easiest way to use Citus is to connect to the coordinator node and use it for both schema changes and distributed queries, but for very demanding applications, you now have the option to load balance distributed queries across the worker nodes in (parts of) your application by using a different connection string and factoring a few limitations.

Keep reading

My main advice when running performance benchmarks for Postgres is: "Automate it!"

If you're measuring database performance, you are likely going to have to run the same benchmark over and over again. Either because you want a slightly different configuration, or because you realized you used some wrong settings, or maybe some other reason. By automating the way you're running performance benchmarks, you won't be too annoyed when this happens, because re-running the benchmarks will cost very little effort (it will only cost some time).

However, building this automation for the database benchmarks can be very time-consuming, too. So, in this post I'll share the tools I built to make it easy to run benchmarks against Postgres—specifically against the Citus extension to Postgres running in a managed database service on Azure called Hyperscale (Citus) in Azure Database for PostgreSQL.

Here's your map for reading this post: each anchor link takes you to a different section. The first sections explore the different types of application workloads and their characteristics, plus the off-the-shelf benchmarks that are commonly used for each. After that you can dive into the "how to" aspects of using HammerDB with Citus and Postgres on Azure. And yes, you'll see some sample benchmarking results, too.

Keep reading

Today, we are excited to announce PostgreSQL 14's General Availability (GA) on Azure's Hyperscale (Citus) option. To our knowledge, this is the first time a major cloud provider has announced GA for a new Postgres major version on their platform one day after the official release.

Starting today, you can deploy Postgres 14 in many Hyperscale (Citus) regions. In upcoming months, we will roll out Postgres 14 across more Azure regions and also release it with our new Flexible Server option in Azure Database for PostgreSQL.

This announcement helps us bring the latest in Postgres to Azure customers as new features become available. Further, it shows our commitment to open source PostgreSQL and its ecosystem. We choose to extend Postgres and share our contributions, instead of creating and managing a proprietary fork on the cloud.

In this blog post, you'll first get a glimpse into some of our favorite features in Postgres 14. These include connection scaling, faster VACUUM, and improvements to crash recovery times.

We'll then describe the work involved in making Postgres extensions compatible with new major Postgres versions, including our distributed database Citus as well as other extensions such as HyperLogLog (HLL), pg_cron, and TopN. Finally, you'll learn how packaging, testing, and deployments work on Hyperscale (Citus). This last part ties everything together and enables us to release new versions on Azure, with speed.

Keep reading

Page 1 of 4