Citus 12.1 is out! Now with PG16 Support. Read all about it in Naisila’s 12.1 blog post. 💥
work_mem is perhaps the most confusing setting within Postgres.
work_mem is a configuration within Postgres that determines how much memory can be used during certain operations. At its surface, the
work_mem setting seems simple: after all,
work_mem just specifies the amount of memory available to be used by internal sort operations and hash tables before writing data to disk. And yet, leaving
work_mem unconfigured can bring on a host of issues. What perhaps is more troubling, though, is when you receive an out of memory error on your database and you jump in to tune
work_mem, only for it to behave in an un-intuitive manner.
work_mem value defaults to 4MB in Postgres, and that's likely a bit low. This means that per Postgres activity (each join, some sorts, etc.) can consume 4MB before it starts spilling to disk. When Postgres starts writing temp files to disk, obviously things will be much slower than in memory. You can find out if you're spilling to disk by searching for
temporary file within your PostgreSQL logs when you have
log_temp_files enabled. If you see
temporary file, it can be worth increasing your
Updated October 2022: In the Azure Cosmos DB for PostgreSQL managed service (powered by Citus open source), we automatically tune
work_mem based on the overall memory available to the box. We do this for both the Citus coordinator node and the Citus worker nodes. Our tuning is based on our team's years of experience working with a large variety of production Postgres workloads, coupled with statistics to compute variations based on cluster sizing.
It's tough to get the right value for
work_mem perfect, but often a sane default can be something like 64 MB, if you're looking for a one size fits all answer.
Updated August 2021: Postgres 13 offers a new setting, hash_mem_multiplier, which is an additional setting to help tune memory usage. Often, hash operations benefit more from memory than other operations, so consider increasing this setting beyond the default of 1.0 to allow hash operations to complete quickly without other operations using too much memory.
Let's use an example to explore how to think about optimizing your
Say you have a certain amount of memory, say 10 GB. If you have 100 running Postgres queries, and each of those queries has a 10 MB connection overhead, then 100*10 MB (1 GB) of memory is taken up by the 100 connections—which leaves you with 9GB of memory.
With 9 GB of memory remaining, say you give 90 MB to
work_mem for the 100 running queries. But wait, it's not that simple. Why? Well,
work_mem isn't set on a per-query basis, rather, it's set based on the number of sort/hash operations. But how many shorts/hashes and joins happen per query? Now that is a complicated question. A complicated question made more complicated if you have other processes that also consume memory, such as autovacuum.
Let's reserve a little for maintenance tasks then and for vacuum and we'll be okay then as long as we limit our connections right? Not so fast my friend.
Postgres now has parallel queries. If you're using Citus for parallelism you've had this for a while, but now you have it on single node Postgres as well. What this means is on a single query you can have multiple processes running and performing work. This can result in some significant improvements in speed of queries, but each of those running processes can consume the specified amount of
work_mem. In the example above, with a
64 MB default and 100 connections, we could now have each of those running a query per each core consuming far more memory than we anticipated.
So we can see that getting it perfect is a little more work than ideal. Let's go back a little and try this more simply... we can start
work_mem small at say 16 MB and gradually increase
work_mem when we see
temporary file. But why not give each query as much memory as it would like? If we were to just say each process could consume up to 1 GB of memory what's the harm? Well the other extreme out there is that queries begin consuming too much memory, more than you have available on your box. When that happens you get 100 queries that have 5 different sort operations and a few hash joins in them it's in fact very possible to exhaust all the memory available to your database.
When you consume more memory than is available on your machine you can start to see out of
out of memory errors within your Postgres logs, or in worse cases the OOM killer can start to randomly kill running processes to free up memory. An out of memory error in Postgres simply errors on the query you're running, where as the the OOM killer in linux begins killing running processes which in some cases might even include Postgres itself.
When you see an
out of memory error you either want to increase the overall RAM on the machine itself by upgrading to a larger instance OR you want to decrease the amount of memory that
work_mem uses. Yes, you read that right: out-of-memory it's better to decrease
work_mem instead of increase since that is the amount of memory that can be consumed by each process and too many operations are leveraging up to that much memory.
While you can continual tune and tweak
work_mem a couple of broad guidelines for pairing to your workload can generally get you into a good spot:
If you have a number of short running queries that run very frequently and perform simple lookups and joins then maintaining a lower work_mem is ideal, in this case you get diminishing returns by allowing it to be significantly higher because it's simply unused.
If your workload is relatively few active queries at a time that are doing very complex sorts and joins then granting more memory to prevent things from spilling can give you great returns.
Postgres powerful feature set and flexibility means you have a lot of knobs you can turn and levers you can pull in tuning it. Postgres is often used for embedded systems, for time series data, for OLTP and OLAP as well.
Updated October 2022: This flexibility in Postgres can often mean an overwhelming set of options when tuning. On Azure Cosmos DB for PostgreSQL, we configure
work_mem to be suitable for most workloads we see, think of it as one size fits most. And then if you ever need to, you're able to customize the
work_mem setting for either/both of the Citus coordinator node and worker nodes. If you're not running Citus on Azure, consider leveraging pgtune to help you get to a good starting point.