POSETTE: An Event for Postgres 2025 will happen virtually Jun 10-12, 2025. Call for Speakers is open! 💥
POSETTE: An Event for Postgres 2025 will happen virtually Jun 10-12, 2025. Call for Speakers is open! 💥
Written by Jelte Fennema-Nio
March 12, 2022
This post by Jelte Fennema was originally published on the Microsoft TechCommunity Blog.
Update in October 2022: Citus has a new home on Azure! The Citus database is now available as a managed service in the cloud as Azure Cosmos DB for PostgreSQL. Azure documentation links have been updated throughout the post, to point to the new Azure docs.
My main advice when running performance benchmarks for Postgres is: "Automate it!"
If you're measuring database performance, you are likely going to have to run the same benchmark over and over again. Either because you want a slightly different configuration, or because you realized you used some wrong settings, or maybe some other reason. By automating the way you're running performance benchmarks, you won't be too annoyed when this happens, because re-running the benchmarks will cost very little effort (it will only cost some time).
However, building this automation for the database benchmarks can be very time-consuming, too. So, in this post I'll share the tools I built to make it easy to run benchmarks against Postgres—specifically against the Citus extension to Postgres running in a managed database service on Azure called Hyperscale (Citus) in Azure Database for PostgreSQL.
Here's your map for reading this post: each anchor link takes you to a different section. The first sections explore the different types of application workloads and their characteristics, plus the off-the-shelf benchmarks that are commonly used for each. After that you can dive into the "how to" aspects of using HammerDB with Citus and Postgres on Azure. And yes, you'll see some sample benchmarking results, too.
Why dive into the background on different workloads and database benchmarks first? Because there's something that's even more important than automating the way you run performance benchmarks: Choosing the right benchmark for you!
Everyone that is using a database is using it for a different workload, because everyone has a different dataset and is running different queries. So, when comparing database performance, you will get the most accurate results by running a benchmark that's based on your own workload. However, preparing a completely custom benchmark can be quite a bit of work.
So instead, you will likely want to run an off-the-shelf benchmark with a workload that is very similar to your own.
There are two different ways in which an off-the-shelf benchmark can be provided to you:
It's obvious that a full benchmark suite is usually what you want since you can simply start the benchmarking application and get results. If you only have a benchmark specification, then you will first need to write tooling to run this specification against a database.
A common workload category for databases is called OLTP (Online Transaction Processing). Workloads that fall in the OLTP category send lots of small, short-running queries (or transactions) to the database.
Some characteristics of OLTP workloads are:
The type of applications that create such a workload often have many concurrent users, that in aggregate do many requests per second. So for OLTP workloads, it's important that the database can handle quite a lot of these queries at the same time. Response time of the application is usually also important, so the database queries should not take very long to run. Queries should always complete in less than a ~5 seconds, and most queries should complete in less than 100ms and probably even faster.
Well known database benchmarks that fall in the OLTP category are YCSB (full suite), TPC-C (specification), and HammerDB TPROC-C (full suite). There are two types of numbers that come out of these OLTP benchmarks that people are usually interested in:
Another common database workload is called OLAP (Online Analytical Processing). This is the type of workload that is often run on data warehouses.
Some characteristics of OLAP workloads are:
As opposed to OLTP, there are usually not a lot of concurrent users in an OLAP system. Usually only a single query—or only a few queries—are running at a time. Response times of these queries are also a lot higher than for OLTP workloads. OLAP queries usually take multiple seconds, or even minutes to complete. But of course database response times are still important in OLAP workloads, and waiting more than 20 minutes for a query result is often unacceptable.
Well known benchmarks that fall in the OLAP category are TPC-H (specification), TPC-DS(specification) and HammerDB TPROC-H (full suite). These benchmarks have a set of queries that use various SQL features and have different levels of complexity and number of JOINs.
There are two different results that an OLAP benchmark can give you:
Another database workload category is called HTAP (Hybrid transactional/analytical processing). This category contains workloads that combine aspects from both OLTP and OLAP workloads. So, there will be lots of active users doing small transactions, while at the same time running a few heavy long-running queries.
There's only one well-known database benchmark that falls in the HTAP category and this benchmark is CH-benCHmark (specification). The CH-benCHmark benchmark specification describes running an OLTP benchmark (TPC-C), while also executing some queries from an OLAP benchmark on the same database. Creating a custom HTAP benchmark yourself is also possible, by running both an OLTP benchmark and an OLAP benchmark of your choice at the same time.
It can be quite hard to compare the numbers that come out of an HTAP benchmark across runs. This stems from the fact you will get two numbers per run of the benchmark and these numbers are often show an inverse correlation:
The problem is that as the number of transactions per second rises, the analytical queries will take longer to run. In other words, when TPS increases (good), then OLAP queries takes longer (bad). There are two reasons for this:
The inverse correlation between these numbers makes it hard to conclusively say if one HTAP benchmark run has better results than another. You can only conclude that one is better if and only if both numbers are better. If one of the numbers is better, while the other is worse, then it becomes a matter of tradeoffs: It's up to you to decide what you consider the most important factor for your workload: the number of OLTP transactions per second, or the time it takes to run the OLAP queries.
OLTP | OLAP | HTAP | |
---|---|---|---|
Small short queries insert/update/delete | yes | no | yes |
Aggregations | no | yes | yes |
Concurrent users | yes | no | yes |
High number of concurrent queries | yes | no | yes |
Fast response times | yes | no | yes |
Large batch inserts/updates | no | yes | yes |
Bulk reads | no | yes | yes |
TopN queries | no | yes | yes |
Typical benchmarks specifications | TPC-C | TPC-H TPC-DS | CH-benCHmark |
Typical benchmark suites | YCSB HammerDB TPROC-C | HammerDB TPROC-H |
Instead of running benchmarks yourself, it can be tempting to compare numbers published online by others. One thing to be careful of when comparing benchmarks run by others: There are many different ways to configure the benchmarks. So, comparing them is usually apples to oranges. A few of the differences that matter a lot are:
So, while it is easiest to compare database benchmark numbers you find online, you probably want to run your own benchmarks with your own data, too.
HammerDB is an easy to use open-source benchmarking suite for databases. HammerDB can be used to run an OLTP or an OLAP benchmark. The OLTP one is called TPROC-C1 and is based on the TPC-C specification. The OLAP benchmark is called TPROC-H and is based on the TPC-H specification. HammerDB has implementations of these benchmarks for a lot of different databases, which makes it easy to compare results across different database types.
I have submitted several pull requests to HammerDB to improve the benchmark suite. One of these pull requests makes HammerDB TPROC-C work with the Citus extension to Postgres (therefore with distributed PostgreSQL). Two others greatly improved the speed at which the benchmark data is loaded into Postgres. All my pull requests have been accepted and were released in HammerDB 4.4. So, starting with HammerDB 4.4 you can run the HammerDB TPROC-C benchmark against Citus.
The main number that HammerDB gives you to compare across benchmark runs is called NOPM (new orders per minute). HammerDB uses NOPM instead of TPS (transactions per second), to make the number comparable between the different databases that HammerDB supports.The way that NOPM is measured is based on the tpmC metric from the official TPC-C specification—although in HammerDB, it is called NOPM instead of tpmC, because tpmC is technically used for official, fully audited benchmark results.)
Like I mentioned at the start, the most important thing when running benchmarks is to automate running them. In my experience you're going to be re-running (almost) the same benchmark a lot!
That's why I wanted to make running performance benchmarks with HammerDB against Postgres and Citus even easier than HammerDB already does on its own.
So, I created open source benchmark tooling (repo on GitHub)around HammerDB to make running benchmarks even easier—especially for the Citus extension to Postgres, running on Azure. When you use Postgres extensions, there are two layers of database software involved: you are running on both the Postgres database and also on the Postgres extension. So, the open source benchmarking automation I created for Citus runs benchmarks on the Hyperscale (Citus) option in the Azure Database for PostgreSQL managed service.
The benchmark tooling I created uses various things to make running benchmarks as easy as possible:
At the time of writing, the open source benchmark tooling I created supports running HammerDB TPROC-C (OLTP) and a custom implementation of the CH-benCHmark specification (HTAP). However, even if you want to run a different benchmark, the tooling I created will likely still be very useful to you. The only thing that you'd have to change to run another benchmark should be the section of the cloud-init script that installs and starts the benchmark. Feel free to send a PR to the repository to add support for another benchmark.
Apart from automating your benchmarks there are a couple of Citus and Postgres related things that you should keep in mind when running benchmarks:
VACUUM ANALYZE
on all your tables. Otherwise, Postgres statistics can be completely wrong, and you might get very slow query plans.shard_count
is a multiple of the number of workers that you have. Otherwise, the shards cannot be divided evenly across your workers, and some workers would get more load than others. A good default shard_count
is 48 as the number 48 is divisible by a lot of numbers.Like I said, I tried to make running benchmarks as easy as possible. So, all you need to do is run this simple command (for detailed instructions check out the README in the "azure" directory):
# IMPORTANT NOTE: Running this command will provision 4 new Citus clusters
# and 4 times a 64-vCore driver VM in your Azure subscription. So, running
# the following command will cost you (or your employer) money!
azure/bulk-run.sh azure/how-to-benchmark-blog.runs | tee -a results.csv
The command above will start running HammerDB TPROC-C on a few different cluster sizes on production infrastructure for Hyperscale (Citus), a deployment option in the Azure Database for PostgreSQL managed service.2 The results of these benchmark runs are all gathered in the results.csv
file.
When you look at the newly created results.csv file, you'll see strings that look like, for example, "c4+2w8":
The total amount of cores present in the cluster is also shown in parenthesis.
Now that you have a csv file, you can use Excel (or whatever you prefer) to create a graph that looks roughly like this:
As you can see, NOPM keeps increasing when you add more workers to the Citus cluster. This shows that Citus delivers on the promise of scaling out: By simply adding more Citus nodes to the cluster in Azure Database for PostgreSQL, our performance goes up.
The numbers in the graph above were gathered using relatively small Citus clusters. The main purpose of the chart is to show you how easy it is to get these numbers using HammerDB and the open source benchmarking tooling I created.
It's possible to observe much higher benchmark results for Citus on Azure if you increase the number of vCores on each database node, and/or if you increase the total number of worker nodes in the Citus cluster. Higher performance with more vCores can be seen in our paper that was accepted at SIGMOD '21. We used a coordinator and 8 workers with 16 cores and the NOPM in that paper was a lot higher.
Recently we also ran HammerDB TPROC-C on a very big Citus database cluster and got a whopping 2.0 million NOPM, using our regular managed service infrastructure on Azure.3
Some more details about this 2M NOPM HammerDB result:
The Azure Database for PostgreSQL - Hyperscale (Citus) database cluster used for this benchmark had a 64 core coordinator and 20 worker nodes with 32 cores each (so 704 cores in total.)
Apart from using more worker nodes and more vCores per node than the sample run discussed earlier in this post (details in Figure 2 above), there was one other thing that needed to be changed to achieve the 2M NOPM: HammerDB needed to be configured to use a lot more concurrent connections. The earlier sample benchmark runs shown in Figured 2 above used 250 connections, but to keep this big cluster constantly busy I configured HammerDB to use 5000 connections.
The number of connections provided by default for Hyperscale (Citus) server groups in Azure Database for PostgreSQL depends on the coordinator size—and the maximum number of user connections is set by the system at 1000. To increase it, you just need to reach out to Azure support and request an increase in the maximum number of user connections to at least 5000 on Postgres 14—a bit more is better to be on the safe side—for your Hyperscale (Citus) server group. So, creating a Hyperscale (Citus) cluster that can reproduce the 2M NOPM results is just a single support ticket away. After that you can simply use my benchmark tooling to run a benchmark against this cluster.
Comparing performance of databases or cloud providers can seem daunting. But with the knowledge and tools provided in this blog, benchmarking the database performance of Hyperscale (Citus) in Azure Database for PostgreSQL should be much easier. When running any performance benchmarks yourself, make sure that you:
Have fun benchmarking!
And regardless of whether you're looking to run your app on Citus open source in a self-managed way—or you're looking to run your application on a managed database service on Azure—it's quite easy to get started with Citus to scale out Postgres.
Footnotes
work_mem | 16MB |
max_connections | 12000 |
superuser_reserved_connections | 3 |
citus.max_shared_pool_size | 12000 |
max_prepared_transactions | 12100 |
pg_stat_statements.track | none |