Citus 11.3 is out! Now with tenant statistics. Read all about it in Marco's 11.3 blog post. 💥
Written by Will Leinweber
March 19, 2018
Data has a certain gravity and inertia. Once it’s stored it’s not likely to be actively moved or frequently modified. At least not for your one source of truth. Protecting that data and ensuring it’s both safely stored but also correct is worth the time investment because of the value it has.
Going further, your database schema and models are going to change far less than your application code. Because it changes less frequently the case can easily be made that spending some time to ensure correctness at the database level is a great return on time.
This post was the result of a recent talk I recently gave at PgDay Paris. The conference itself was a great local event in Paris, and while there we had a chance to meet with a few of our customers based in Paris as well. As it’s always great to get out in person and chat with people about Postgres and their experience in scaling their database, many remarked that the talk could be useful to others that weren’t there. So as I thought it would be worthwhile to write-up, and here you go:
Constraints are an extremely powerful construct for your database. And when it comes to your database, constraints are the last line of defense. Constraints are essentially rules that your data has to follow.
Let’s take a look at a few of the various types of database constraints we can create to ensure data correctness:
Unique indexes or constraints are common. Usually you would have a unique constraint on your
deleted_at set to a timestamp then that user cannot come back in and recreate an account. A better option would be having the unique index that filters all the deleted records:
CREATE UNIQUE INDEX ON users (email) WHERE deleted_at is null;
Check constraints allow you to easily create custom rules on your tables. A simple and straightforward example would be if we have an ecommerce store and want to ensure all prices are greater than 0:
CREATE TABLE products ( product_no integer, name text, price numeric CHECK (price > 0) );
Beyond very basic checks such as for positive prices, you could get more clever. The Postgres docs actually give us a great example (you want to make sure sale prices are cheaper than the original price):
CREATE TABLE products ( product_no integer, name text, price numeric CHECK (price > 0), sale_price numeric CHECK (sale_price > 0), CHECK (price > sale_price) );
And of course you can get quite clever and fun here. Let’s say you’re creating a dating site for math fans and one of the profile questions is your favorite fibonacci number. Here we can first create a function that checks if a function is a fibonacci number:
CREATE OR REPLACE FUNCTION is_fib(i int) RETURNS boolean AS $$ DECLARE a integer := 5*i*i+4; b integer := 5*i*i-4; asq integer := sqrt(a)::int; bsq integer := sqrt(b)::int; BEGIN RETURN asq*asq=a OR bsq*bsq=b; end $$ LANGUAGE plpgsql IMMUTABLE STRICT;
Then we can create our table that uses our fibonacci check and test that it works by inserting some data:
# CREATE TABLE onlyfib( i int CHECK (is_fib(i)) ); CREATE TABLE # insert into onlyfib values (5), (8); INSERT 0 2 # insert into onlyfib values (6); ERROR: new row for relation "onlyfib" violates check constraint "onlyfib_i_check" DETAIL: Failing row contains (6).
Exclusion constraints are not the most common type of constraint you will use, but they are extremely useful when you’re dealing ranges in Postgres. Let’s say you’re building a class scheduling system and you don’t want to allow a student to be in two classes at the same time. Or you have a billing system that bills based on usage and a customer can only be billed one price at a time for each resource. In other words you don’t want to double bill your customer. Here the overlap (
&&) operator allows us to ensure that billing records don’t overlap.
First let’s create our billings table:
CREATE TABLE billings ( id uuid NOT NULL, period tstzrange NOT NULL, price_per_month integer NOT NULL );
Now with the overlap operator we can create a constraint that excludes any values from existing that would overlap:
ALTER TABLE billings ADD CONSTRAINT billings_excl EXCLUDE USING gist ( id WITH =, period WITH && );
The takeaway: if you’re dealing with from/to ranges give exclusion constraints a closer look.
Okay so datatypes aren’t really constraints, except they very much can be. Using datatypes specific to what you’re doing will ensure a level of correctness of the data just as much as the above constraints will. Want to ensure tweets aren’t longer than 140 characters, make sure you set your
tweet column to
varchar(140) and not to
But there is much more than just ensuring text lengths. As we mentioned earlier you have range types for from and to values. You have IP datatypes to ensure something is a valid IP. You have range types which allow you to have a to and from. You have booleans for true or false. You have JSONB for unstructured document storage.
Application code changes happen all the time, database schema changes are far less frequent. Spending extra time ensuring your data is correct has great return on the time you put in. It’s a lot easier to catch bad data before it goes in than it is to clean it up after the fact. At Citus Data we value our data and deeply about the safety and quality of it, we hope you find these tips for working with constraints a valuable tool in your toolbox.