12 Factor: Dev/prod parity for your database

The twelve-factor app changed the way we build SaaS applications. Explicit dependency management, separating config from code, scaling out your app concurrently—these design principles took us from giant J2EE apps to apps that scale predictably on the web. One of these 12 factors has long stood out as a challenge when it comes to databases: dev/prod parity. Sure, you can run the exact same version of your database, and have a sandbox copy, but testing and staging with production data… that’s a different story.

Dev/Prod parity is easy until it’s not

Running the same version of your database in development as in production should be a no brainer. Just do it. If you use Postgres 10 in production, make sure to use the same version of Postgres in dev. For Postgres at least, you’re usually pretty safe on point releases so between 10.1 and say 10.3 you may not have to worry too much, but at least keep the same major version of the database the same between dev and prod.

Now that we have the easy part out of the way, the complexity starts to come with the data. The version of your database is one piece, but how your data interacts is equally as key. If your perfectly pristine data set for development doesn’t attempt to create hotspots in your data, violate constraints, or in general push the boundaries of what’s allowed then there is a good chance you run into issues when you deploy to production.

Remember that time that you added an index to speed things up… Everything tested fine on your local dev box. Even in staging against the 10 GB, the sample DB worked fine. You deployed, and the index ran as part of your deploy script, and 30 minutes in you were madly figuring out how to cancel things as so many people showed up at your desk asking why the system is down?

Let’s start with database migrations, and indexes

We’re going to come back to dev/prod parity in a minute and how you can safely test operations against a production dataset. But first, let’s tackle two practices that you should put in place immediately to save you some heartache later on:

  1. Safe migrations for large datasets
  2. Concurrent index creation

Safer migrations for large datasets

Postgres is pretty good and fast when you add a new column. The caveat comes if you set a default value on a non-nullable column for that table. The result is that your database has to read all the records and rewrite a new copy of the table with the default value set. If you were following along earlier, in staging on a 1 GB table this might take a few seconds, maybe a minute. In production against a 100 GB table, you could be waiting up to an hour. Even worse, while the table is being read and rewritten an exclusive lock is taken which means new writes will have to wait until the operation is completed. There is a safer approach to your migrations though:

  1. Add your column allowing nulls, but with a default value
  2. Backfill for all nulls in a background job
  3. Add your not null constraint

Following this process you can reliable have migrations whether Rails, Django, or any framework that won’t risk downtime to your application.

Note: In Postgres 11 this becomes a non-issue.

Indexes for performance, but proceed with care

Indexes make your reads faster, we’ve talked about it before. A combination of pgstatstatements and landlord can be extremely powerful for getting insights on what to optimize. And we’ve talked about how Postgres has a robust set of index types.

One best practice is to add your indexes concurrently—regardless of whether you’re optimizing for tenant performance (say, with a multi-tenant app) or whether you’re trying to improve things across the board. When you add concurrently to the create index command, the index is built in the background and doesn’t hold the same exclusive lock that is typically taken when you add an index. The downside is that CREATE INDEX CONCURRENTLY can’t be run within a transaction, the positive though is you don’t have to wait hours for an index to be added on 1 TB table.

Back to dev/prod parity

Running your index creations, your migrations, everything against staging prior to running it against production is key to safer deploys. But as we’ve seen highlighted above you need your staging dataset to also imitate production. When production is small at 10 GB of data this isn’t too bad to do a dump/restore. As your production grows a dump/restore across 100 GB or say 1 TB can take hours to maybe even days, not to mention introduce heavy load on your database. With an ever changing dataset how then do you test things? Enter the ability to Fork your database.

You can think of a forking your database just like forking a git repo. A fork is a copy of the database as it exists as of some point in time. If you fork production to staging, you as of that moment you fork you’ll get all the data in that state. Any changes that happen from them on to production are not replicated to staging. Database forks work by leveraging underlying Postgres base backups and then replaying the underlying WAL (write-ahead-log) to that point in time. By leveraging already existing disaster recovery tooling it means that forking your database doesn’t introduce load onto production.

Have you been burned by any of the issues discussed earlier? If so you likely didn’t have dev/prod parity for your database. Forking your database gives you a safe way to test risky operations in staging, consider leveraging database forks if you’re not already today.