If you want to learn more about Citus on Microsoft Azure, read this post about Hyperscale (Citus) on Azure Database for PostgreSQL.
This week at RailsConf, we found ourselves sharing a lot of tips for using PostgreSQL with Rails. We thought it might be worthwhile to write up many of these and share more broadly. Here you’ll find some tips that will help you in debugging and improving performance of your database from your Rails app.
And now, on to the code.
Manage long running queries with statement time-outs
Long running queries can have all sorts of negative impact on your database. Whether running for hours or even several seconds they can hold locks, queue up WAL, or just consume a lot of system resources. Postgres makes it easier to have a bit more safety around this with a default statement timeout. What’s nice is you can set a default such as 5 seconds shown here, any query that then runs longer than 5 seconds will be killed:
production: url: <%= DATABASE_URL %> variables: statement_timeout: 5000
Should you need a query to run longer within the database session you can set a custom statement timeout just valid for the current connection:
class MyAnalyticsJob < ActiveJob::Base queue_as :analytics def perform ActiveRecord::Base.connection.execute “SET statement_timeout = 600000” # 10 minutes # ... ensure ActiveRecord::Base.connection.execute “SET statement_timeout = 5000” # 5 seconds end end
Finding misbehaving queries
Rails abstracts away a lot of things when interacting with your database. That can be both a good and a bad. Postgres can show you long running queries, but as your Rails app grows that doesn’t by itself give you everything you need to solve the problem. To know where the query originated from there’s one particularly handy gem marginalia which will log exactly where your query came from. Now when you see some query that is wrong, or running too slow, or could just be removed you know exactly where to go in your code to fix it:
Account Load (0.3ms) SELECT `accounts`.* FROM `accounts` WHERE `accounts`.`queenbee_id` = 1234567890 LIMIT 1 /*application:BCX,controller:project_imports,action:show*/
Getting a high-level overview of database queries
Often you need a high-level picture of what is currently going on in your database.
pg_stat_statements is a Postgres extension that comes pre-installed in cloud environments like Citus Cloud, and allows you to see which queries have been running since the last statistics reset, and how they were performing.
For example, to see the 10 most time consuming queries, and their average time, do the following:
SELECT query, total_time / calls AS avg_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
If you enable “track_io_timing” on your Postgres database, this also allows you to see whether you are bottlenecked on CPU or I/O performance. You can read more on pg_stat_statements here
Taking advantage of advanced features
By default Rails uses a file called
schema.rb to store a copy of the database schema, typically used to initialize the database before running tests. Unfortunately, many advanced Postgres features, like functional and partial indices, as well as multi-column primary keys, can’t be represented in the DSL used for that file.
Instead, it makes sense to switch to having Rails generate and use the “db/structure.sql” file, which you can do like this:
# Use SQL instead of Active Record's schema dumper when creating the database. # This is necessary if your schema can't be completely dumped by the schema dumper, # like if you have constraints or database-specific column types config.active_record.schema_format = :sql
Under the hood this uses Postgres’ pg_dump format, which can be a bit verbose at times, but ensures you get the complete database structure restored. If you find yourself running into issues with overly long diffs, you might want to take a look at activerecord-clean-db-structure.
Watch out for complex transactions locking against each other
Rails likes to put everything in a transaction, especially when using before_save hooks and multi-level relationships between models. There is one important caveat to watch out for with transactions, which can bite you as you scale out. In a transaction like the following:
BEGIN; UPDATE organizations SET updated_at = ‘2017-04-27 11:31:03 -0700’ WHERE id = 123; SELECT * FROM products WHERE store_id = 456; --- More statements here COMMIT;
The first UPDATE statement will start holding a row-level lock on the
organizations row with id
123 as soon as you issue it, until the COMMIT happens.
Imagine another request for the same organization coming in (e.g. from a different user), making a similar transaction. Typically that other request will have to wait for the transaction to commit, in order to be able to proceed, slowing down the response time. To fix this, it often helps to rearrange the transaction so UPDATEs happen towards the end, as well as consider pulling out changes to timestamp fields to happen outside of the transaction, after the main work has been done.
In order to find problems like this earlier, you can set “log_lock_waits = on” in PostgreSQL.
Keep a handle on your database connections
Rails by default maintains a pool of connections to your database. When a new request comes in it will take one of the connections from the pool and give it to your application. As you scale your Rails app this can result in hundreds of open connections to your database, though in reality only a portion of them are doing work. The key here is to use a connection pooler like pgBouncer to reduce the active connections to your database. A connection pooler will open up connections when transactions are active as opposed to passing on the idle ones that aren’t doing any work.