Citus Blog

Articles tagged: performance

David Rowley

Speeding up sort performance in Postgres 15

Written byBy David Rowley | May 19, 2022May 19, 2022

In recent years, PostgreSQL has seen several improvements which make sorting faster. In the PostgreSQL 15 development cycle—which ended in April 2022—Ronan Dunklau, Thomas Munro, Heikki Linnakangas, and I contributed some changes to PostgreSQL to make sorts go even faster.

Each of the improvements to sort should be available when PostgreSQL 15 is out in late 2022.

Why care about sort performance? When you run your application on PostgreSQL, there are several scenarios where PostgreSQL needs to sort records (aka rows) on your behalf. The main one is for ORDER BY queries. Sorting can also be used in:

  • Aggregate functions with an ORDER BY clause
  • GROUP BY queries
  • Queries with a plan containing a Merge Join
  • UNION queries
  • DISTINCT queries
  • Queries with window functions with a PARTITION BY and/or ORDER BY clause

If PostgreSQL is able to sort records faster, then queries using sort will run more quickly.

Keep reading
Marco Slot

Test drive the Citus 11.0 beta for Postgres

Written byBy Marco Slot | March 26, 2022Mar 26, 2022

Today we released Citus 11.0 beta, which is our first ever beta release of the Citus open source extension to Postgres. The reason we are releasing a beta version of 11.0 is that we are introducing a few fundamentally new capabilities, and we would like to get feedback from those of you who use Citus before we release Citus 11.0 to the world.

The biggest change in Citus 11.0 beta is that the schema and Citus metadata are now automatically synchronized throughout the database cluster. That means you can always query distributed tables from any node in a Citus cluster!

The easiest way to use Citus is to connect to the coordinator node and use it for both schema changes and distributed queries, but for very demanding applications, you now have the option to load balance distributed queries across the worker nodes in (parts of) your application by using a different connection string and factoring a few limitations.

Keep reading

My main advice when running performance benchmarks for Postgres is: “Automate it!”

If you’re measuring database performance, you are likely going to have to run the same benchmark over and over again. Either because you want a slightly different configuration, or because you realized you used some wrong settings, or maybe some other reason. By automating the way you’re running performance benchmarks, you won’t be too annoyed when this happens, because re-running the benchmarks will cost very little effort (it will only cost some time).

However, building this automation for the database benchmarks can be very time-consuming, too. So, in this post I’ll share the tools I built to make it easy to run benchmarks against Postgres—specifically against the Citus extension to Postgres running in a managed database service on Azure called Hyperscale (Citus) in Azure Database for PostgreSQL.

Here’s your map for reading this post: each anchor link takes you to a different section. The first sections explore the different types of application workloads and their characteristics, plus the off-the-shelf benchmarks that are commonly used for each. After that you can dive into the “how to” aspects of using HammerDB with Citus and Postgres on Azure. And yes, you’ll see some sample benchmarking results, too.

Keep reading

Today, we are excited to announce PostgreSQL 14’s General Availability (GA) on Azure’s Hyperscale (Citus) option. To our knowledge, this is the first time a major cloud provider has announced GA for a new Postgres major version on their platform one day after the official release.

Starting today, you can deploy Postgres 14 in many Hyperscale (Citus) regions. In upcoming months, we will roll out Postgres 14 across more Azure regions and also release it with our new Flexible Server option in Azure Database for PostgreSQL.

This announcement helps us bring the latest in Postgres to Azure customers as new features become available. Further, it shows our commitment to open source PostgreSQL and its ecosystem. We choose to extend Postgres and share our contributions, instead of creating and managing a proprietary fork on the cloud.

In this blog post, you’ll first get a glimpse into some of our favorite features in Postgres 14. These include connection scaling, faster VACUUM, and improvements to crash recovery times.

We’ll then describe the work involved in making Postgres extensions compatible with new major Postgres versions, including our distributed database Citus as well as other extensions such as HyperLogLog (HLL), pg_cron, and TopN. Finally, you’ll learn how packaging, testing, and deployments work on Hyperscale (Citus). This last part ties everything together and enables us to release new versions on Azure, with speed.

Keep reading
Onder Kalaci

What’s new in the Citus 10.2 extension to Postgres

Written byBy Onder Kalaci | September 17, 2021Sep 17, 2021

Citus 10.2 is out! If you are not yet familiar with Citus, it is an open source extension to Postgres that transforms Postgres into a distributed database—so you can achieve high performance at any scale. The Citus open source packages are available for download. And Citus is also available in the cloud as a managed service, too.

You can see a bulleted list of all the changes in the CHANGELOG on GitHub. This post is your guide to what’s new in Citus 10.2, including some of these headline features.

Keep reading
Jelte Fennema

Shard rebalancing in the Citus 10.1 extension to Postgres

Written byBy Jelte Fennema | September 3, 2021Sep 3, 2021

With the 10.1 release to the Citus extension to Postgres, you can now monitor the progress of an ongoing shard rebalance—plus you get performance optimizations, as well as some user experience improvements to the rebalancer, too.

Whether you use Citus open source to scale out Postgres, or you use Citus in the cloud, this post is your guide to what’s new with the shard rebalancer in Citus 10.1.

And if you’re wondering when you might need to use the shard rebalancer: the rebalancer is used when you add a new Postgres node to your existing Citus database cluster and you want to move some of the old data to this new node, to “balance” the cluster. There are also times you might want to balance shards across nodes in a Citus cluster in order to optimize performance. A common example of this is when you have a SaaS application and one of your customers/tenants has significant more activity than the rest.

Keep reading

Citus 10.1 is out! In this latest release to the Citus extension to Postgres, our team focused on improving your user experience. Some of the 10.1 fixes are operational improvements—such as with the shard rebalancer, or with citus_update_node. Some are performance improvements—such as for multi-row INSERTs or with citus_shards. And some are fixes you’ll appreciate if you use Citus with lots of Postgres partitions.

Given that the previous Citus 10 release included a bevy of new features—including things like columnar storage, Citus on a single node, open sourcing the shard rebalancer, new UDFs so you can alter distributed table properties, and the ability to combine Postgres and Citus tables via support for JOINs between local and distributed tables, and foreign keys between local and reference tables—well, we felt that Citus 10.1 needed to prioritize some of our backlog items, the kinds of things that can make your life easier.

This post is your guide to the what’s new in Citus 10.1. And if you want to catch up on all the new things in past releases to Citus, check out the release notes posts about Citus 10, Citus 9.5, Citus 9.4, Citus 9.3, and Citus 9.2.

Keep reading
David Rowley

Speeding up recovery & VACUUM in Postgres 14

Written byBy David Rowley | March 25, 2021Mar 25, 2021

One of the performance projects I’ve focused on in PostgreSQL 14 is speeding up PostgreSQL recovery and vacuum. In the PostgreSQL team at Microsoft, I spend most of my time working with other members of the community on the PostgreSQL open source project. And in Postgres 14 (due to release in Q3 of 2021), I committed a change to optimize the compactify_tuples function, to reduce CPU utilization in the PostgreSQL recovery process. This performance optimization in PostgreSQL 14 made our crash recovery test case about 2.4x faster.

The compactify_tuples function is used internally in PostgreSQL:

  • when PostgreSQL starts up after a non-clean shutdown—called crash recovery
  • by the recovery process that is used by physical standby servers to replay changes (as described in the write-ahead log) as they arrive from the primary server
  • by VACUUM

So the good news is that the improvements to compactify_tuples will: improve crash recovery performance; reduce the load on the standby server, allowing it to replay the write-ahead log from the primary server more quickly; and improve VACUUM performance.

Keep reading

One of the main reasons people use the Citus extension for Postgres is to distribute the data in Postgres tables across multiple nodes. Citus does this by splitting the original Postgres table into multiple smaller tables and putting these smaller tables on different nodes. The process of splitting bigger tables into smaller ones is called sharding—and these smaller Postgres tables are called “shards”. Citus then allows you to query the shards as if they were still a single Postgres table.

One of the big changes in Citus 10—in addition to adding columnar storage, and the new ability to shard Postgres on a single Citus node—is that we open sourced the shard rebalancer.

Yes, that’s right, we have open sourced the shard rebalancer! The Citus 10 shard rebalancer gives you an easy way to rebalance shards across your cluster and helps you avoid data hotspots over time. Let’s dig into the what and the how.

Keep reading
Jeff Davis

Citus 10 brings columnar compression to Postgres

Written byBy Jeff Davis | March 6, 2021Mar 6, 2021

Citus 10 is out! Check out the Citus 10 blog post for all the details. Citus is an open source extension to Postgres (not a fork) that enables scale-out, but offers other great features, too. See the Citus docs and the Citus github repo and README.

This post will highlight Citus Columnar, one of the big new features in Citus 10. You can also take a look at the columnar documentation. Citus Columnar can be used with or without the scale-out features of Citus.

Postgres typically stores data using the heap access method, which is row-based storage. Row-based tables are good for transactional workloads, but can cause excessive IO for some analytic queries.

Columnar storage is a new way to store data in a Postgres table. Columnar groups data together by column instead of by row; and compresses the data, too. Arranging data by column tends to compress well, and it also means that queries can skip over columns they don’t need. Columnar dramatically reduces the IO needed to answer a typical analytic query—often by 10X!

Keep reading

Page 1 of 4