Blog posts tagged with 'performance' on the Citus Blog

PgBouncer Connection Pooler for Postgres Now Supports More Session Vars

Written byBy Emel Simsek | April 4, 2024Apr 4, 2024

PgBouncer is probably the most popular connection pooler for Postgres. It is essentially a transparant middleware between clients and the server. However, it is not %100 transparent in practice. There are a few intricacies that should be taken into account when using PgBouncer. One such consideration is that PgBouncer does not support the use of all session variables in transaction pooling mode. This lack of support is one of the reasons that the most commonly used transaction pooling mode is not fully compatible with Postgres. PgBouncer 1.20.0 started supporting two of the most requested session variables and laid ground work to be able to support all session variables in the future. Let’s break this down further.

The Impact of Pooling Mode on Postgres Compatibility

A connection pooler between the client and the server should ideally be completely transparent such that your application doesn’t have to be aware of the presence of a connection pooler. This is not the case with PgBouncer for all the pooling modes.

There are three different connection pooling modes:

Session
Transaction
Statement

The pooling mode determines how the connections in the pool are assigned to the clients. The way this is done may impose some limitations impacting Postgres compatibility.

In session pooling mode, there is a one-to-one mapping between the client and the pooled server connection throughout the lifetime of the client connection.

Hence session pooling mode is the most compatible mode with Postgres. The clients can use session variables which are Postgres parameters whose values can be set per session.

Keep reading

What’s new in the Postgres 16 query planner / optimizer

Written byBy David Rowley | February 8, 2024Feb 8, 2024

PostgreSQL 16 introduces quite a few improvements to the query planner and makes many SQL queries run faster than they did on previous versions of PostgreSQL.

If you look at the PG16 release notes, you’ll see some of these planner improvements. But with the volume of changes made in each PostgreSQL release, it’s not possible to provide enough detail about each and every change. So maybe you might need a bit more detail to know what the change is about—before you understand if it’s relevant to you.

In this blog post, assuming you’ve already got a handle on the basics of EXPLAIN, you’ll get a deep dive into the 10 improvements made in the PostgreSQL 16 query planner. For each of the improvements to the PG16 planner (the planner is often called an optimizer in other relational databases), you’ll also get comparisons between PG15 and PG16 planner output—plus examples of what changed, in the form of a self-contained test you can try for yourself.

Keep reading

Podcast highlights on benchmarking Postgres performance, from Path To Citus Con Episode 11

Written byBy Aaron Wislang | February 2, 2024Feb 2, 2024

Episode 11 of Path To Citus Con—the monthly podcast for developers who love Postgres—is now out. This episode featured guests Jelte Fennema-Nio and Marco Slot who joined us (along with co-hosts Claire Giordano and Pino de Candia) to talk about performance benchmarking, specifically benchmarking databases and Postgres.

The official title of Episode 11 with Jelte and Marco is “My Journey into Performance Benchmarking”.

I love hearing how people find themselves doing what they do, so it was fascinating to hear how they got started. One of the most interesting insights was the impact something like benchmarking can have on your career—even if you don’t particularly enjoy it!

Keep reading

Distributed PostgreSQL benchmarks using HammerDB, by GigaOM

Written byBy Marco Slot | June 21, 2023Jun 21, 2023

Distributed PostgreSQL has become a hot topic. Several distributed database vendors have added support for the PostgreSQL protocol as a convenient way to gain access to the PostgreSQL ecosystem. Others (like us) have built a distributed database on top of PostgreSQL itself.

For the Citus database team, distributed PostgreSQL is primarily about achieving high performance at scale. The unique thing about Citus, the technology powering Azure Cosmos DB for PostgreSQL, is that it is fully implemented as an open-source extension to PostgreSQL. It also leans entirely on PostgreSQL for storage, indexing, low-level query planning and execution, and various performance features. As such, Citus inherits the performance characteristics of a single PostgreSQL server but applies them at scale.

That all sounds good in theory, but to see whether this holds up in practice, you need benchmark numbers. We therefore asked GigaOM to run performance benchmarks comparing Azure Cosmos DB for PostgreSQL to other distributed implementations. GigaOM compared the transaction performance and price-performance of these popular managed services of distributed PostgreSQL, using the HammerDB benchmark software:

Keep reading

Diary of an Engineer: Improved Citus metadata sync for Postgres in 11.3

Written byBy Aykut Bozkurt | June 16, 2023Jun 16, 2023

One of the most important improvements in Citus 11.3 is that Citus offers more reliable metadata sync. Before 11.3, when a Citus cluster had thousands of distributed objects (such as distributed tables), Citus occasionally experienced memory problems while running metadata sync. Due to these memory errors, some users with very large numbers of tables were sometimes unable to add new nodes or upgrade beyond Citus 11.0.

To address the memory issues, we added an alternative "non-transactional" mode to the current metadata sync in Citus 11.3.

The default mode for metadata sync is still the original single transaction mode that we introduced in Citus 11.0. But now in 11.3 or later, if you have a very large number of tables and you run into the memory error, you can choose to optionally switch to the non-transactional mode, which syncs the metadata via many transactions. While most of you who use Citus will not need to enable this alternative metadata sync mode, this is how to do it:

Keep reading

Debugging Postgres autovacuum problems: 13 tips

Written byBy Samay Sharma | July 28, 2022Jul 28, 2022

If you've been running PostgreSQL for a while, you've heard about autovacuum. Yes, autovacuum, the thing which everybody asks you not to turn off, which is supposed to keep your database clean and reduce bloat automatically.

And yet—imagine this: one fine day, you see that your database size is larger than you expect, the I/O load on your database has increased, and things have slowed down without much change in workload. You begin looking into what might have happened. You run the excellent Postgres bloat query and you notice you have a lot of bloat. So you run the VACUUM command manually to clear the bloat in your Postgres database. Good!

But then you have to address the elephant in the room: why didn't Postgres autovacuum clean up the bloat in the first place...? Does the above story sound familiar? Well, you are not alone. 😊

Keep reading

Citus 11 for Postgres goes fully open source, with query from any node

Written byBy Marco Slot | June 17, 2022Jun 17, 2022

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

Keep reading

Speeding up sort performance in Postgres 15

Written byBy David Rowley | May 19, 2022May 19, 2022

In recent years, PostgreSQL has seen several improvements which make sorting faster. In the PostgreSQL 15 development cycle—which ended in April 2022—Ronan Dunklau, Thomas Munro, Heikki Linnakangas, and I contributed some changes to PostgreSQL to make sorts go even faster.

Each of the improvements to sort should be available when PostgreSQL 15 is out in late 2022.

Why care about sort performance? When you run your application on PostgreSQL, there are several scenarios where PostgreSQL needs to sort records (aka rows) on your behalf. The main one is for ORDER BY queries. Sorting can also be used in:

Aggregate functions with an ORDER BY clause
GROUP BY queries
Queries with a plan containing a Merge Join
UNION queries
DISTINCT queries
Queries with window functions with a PARTITION BY and/or ORDER BY clause

If PostgreSQL is able to sort records faster, then queries using sort will run more quickly.

Keep reading

Test drive the Citus 11.0 beta for Postgres

Written byBy Marco Slot | March 26, 2022Mar 26, 2022

Today we released Citus 11.0 beta, which is our first ever beta release of the Citus open source extension to Postgres. The reason we are releasing a beta version of 11.0 is that we are introducing a few fundamentally new capabilities, and we would like to get feedback from those of you who use Citus before we release Citus 11.0 to the world.

The biggest change in Citus 11.0 beta is that the schema and Citus metadata are now automatically synchronized throughout the database cluster. That means you can always query distributed tables from any node in a Citus cluster!

The easiest way to use Citus is to connect to the coordinator node and use it for both schema changes and distributed queries, but for very demanding applications, you now have the option to load balance distributed queries across the worker nodes in (parts of) your application by using a different connection string and factoring a few limitations.

Keep reading

How to benchmark performance of Citus and Postgres with HammerDB on Azure

Written byBy Jelte Fennema-Nio | March 12, 2022Mar 12, 2022

My main advice when running performance benchmarks for Postgres is: "Automate it!"

If you're measuring database performance, you are likely going to have to run the same benchmark over and over again. Either because you want a slightly different configuration, or because you realized you used some wrong settings, or maybe some other reason. By automating the way you're running performance benchmarks, you won't be too annoyed when this happens, because re-running the benchmarks will cost very little effort (it will only cost some time).

However, building this automation for the database benchmarks can be very time-consuming, too. So, in this post I'll share the tools I built to make it easy to run benchmarks against Postgres—specifically against the Citus extension to Postgres running in a managed database service on Azure called Hyperscale (Citus) in Azure Database for PostgreSQL.

Here's your map for reading this post: each anchor link takes you to a different section. The first sections explore the different types of application workloads and their characteristics, plus the off-the-shelf benchmarks that are commonly used for each. After that you can dive into the "how to" aspects of using HammerDB with Citus and Postgres on Azure. And yes, you'll see some sample benchmarking results, too.

Keep reading