Citus 11 is out! Now 100% open source. Read all about it in Marco’s release blog. 💥
Postgres is a rich and powerful database. And the existence of PostgreSQL extension APIs have enabled Postgres to expand its capabilities beyond the boundaries of what you would expect in a traditional relational database. Examples of popular Postgres extensions today include HyperLogLog, which gives you approximate distincts with a small footprint—to rich geospatial support via PostGIS—to Citus which helps you scale out your Postgres database across multiple nodes to improve performance for multi-tenant SaaS applications and real-time analytics dashboards—to the built-in full text search capabilities in PostgreSQL. With all the bells and whistles you can layer into Postgres, sometimes the most basic built-ins get overlooked.
PostgreSQL has nearly 100 different data types, and these data types can come with their own tuned indexing or their own specialized functions. You probably already use the basics such as integers and text, and today we’re going to take a survey of less-used but incredibly powerful PostgreSQL data types.
Postgres first received JSON support in Postgres 9.2. But the initial JSON support in 9.2 was about JSON validation, hence was less ideal for many use cases that needed JSON as well as fast query performance.
A couple of years later we got the the successor to the JSON datatype: JSONB. JSONB is a binary version of JSON stored on disk. JSONB compresses, so you lose whitespace, but it comes with some powerful index types to allow you to work much more flexibly with your JSON data.
JSONB is great for unstructured data, and with Postgres you can easily join JSONB data to your other relational models. We use JSONB ourselves heavily at Citus for things such as feature flags, event observation data, and recording logs. You can index JSONB data with GIN indexes which allow you to query on keys and speed up your lookups, since the GIN index makes the full document automatically available for you.
Let’s face it, dealing with time in any fashion is hard. When dealing with time ranges, the challenges can be even worse: how do ensure your your conference schedule doesn’t have two talks scheduled at the same time in a given room? How do you ensure you only have a single invoice for each month? With range types, the value has a from and to value, or a range. You can have ranges of numbers such as 1-20, or ranges of timestamps. The next time you have two columns in your database for a from-to, or a start-top value, consider using a timestamp range.
Once you have your timestamp range in place, make sure to set up your constraints to enforce the data integrity you’re looking for.
Knowing the values that are inserted into your database are valid can be just as important as having flexibility. Enumerated data types (enums) are a great candidate for certain values that seldom change. With an enum, you first define the type and then use that type when creating your table. A great example may be states for invoices. First you can create your enumerated type, called
invoice_state in this example
CREATE TYPE invoice_state as enum('pending', 'failed', 'charged');
Then on your invoices table you can use the newly-created enumerated type as the column type:
CREATE TABLE invoices ( id serial, customer_id int, amount int, state invoice_state );
Internally for the operation of our Citus Cloud database as a service, we use enums for things like invoice states, and also for AWS regions we support as those seldom change and it can be overkill to add another table to join against.
Whether it’s an IP address, a timestamp, a UUID, or other data type if you have a data type within your application consider using the equivalent match for that data type in Postgres. By using Postgres data types you’re able to get the maximum leverage and flexibility out of your database and with Postgres track record of improving functions and features around data types your world will only get better.