Fun with SQL: Functions in Postgres

Written by Craig Kerstiens
June 21, 2018

In our previous Fun with SQL post on the Citus Data blog, we covered window functions. Window functions are a special class of function that allow you to grab values across rows and then perform some logic. By jumping ahead to window functions, we missed so many of the other handy functions that exist within Postgres natively. There are in fact several hundred built-in functions. And when needed, you can also create your own user defined functions (UDFs), if you need something custom. Today we're going to walk through just a small sampling of SQL functions that can be extremely handy in PostgreSQL.

Arrays

First, arrays are a first class datatype within Postgres. You can have an array of text or an array of numbers, personally I love using arrays when dealing with category tags. You can also index arrays which can make querying extremely fast. But even if you're not putting arrays directly into your database, you may want to build up arrays within your query.

A good example might be when you have a team feature within your application, and you want to group all members that belong to that team into a single row. To do this, we can use the array_agg function which will aggregate some column together into a single row:

SELECT teams.id, 
       teams.name, 
       array_agg(users.email)
FROM teams,
     users
WHERE users.team_id = teams.id
GROUP BY 1, 2;
 id | name  |                          array_agg
----+-------+--------------------------------------------------------------
  2 | ACME  | {jennifer@acmecorp.com,tom@acmecorp.com,peyton@acmecorp.com}
  1 | Citus | {craig@citusdata.com,farina@citusdata.com}
(2 rows)

The above by itself is pretty handy, but we can go even further. If we want to send this directly to some email client, we can go ahead and parse out the array to just be a comma separated list by wrapping our array_agg function with a function to convert it to a string array_to_string:

SELECT teams.id, 
       teams.name, 
       array_to_string(array_agg(users.email), ', ')
FROM teams,
     users
WHERE users.team_id = teams.id
GROUP BY 1, 2;

Now our output format is much more usable:

 id | name  |                       array_to_string
----+-------+--------------------------------------------------------------
  2 | ACME  | jennifer@acmecorp.com, tom@acmecorp.com, peyton@acmecorp.com
  1 | Citus | craig@citusdata.com, farina@citusdata.com
(2 rows)

There is a whole slew of things you can do with arrays, but let's shift gears to see what other categories exist.

Dealing with time

Timestamps can be an extremely frustrating thing to work with. If you're thinking about building a calendar-ing app, well just don't. Inevitably you will have some dates and times within your database that you'll need to work with. Lucky for you, Postgres has a solid set of functions to help make your life a little easier.

One of the most common functions I use when working with time is now(). It will simply give me the current timestamp of the server. This is handy in your application when setting the last_updated field, or using now the first time you set created_at. It can also be useful when you're comparing things, such as for the past week. By starting with now(), and then doing some date interval math, you can easily get all users that created their accounts in the last week:

SELECT email
FROM users
WHERE created_at > now() - '7 days'::interval;

         email
-----------------------
 craig@citusdata.com
 jennifer@acmecorp.com
 tom@acmecorp.com
(3 rows)

One less commonly used one is the literal for allballs. Allballs is UTC 00:00:00, it's often used within communication and military settings that follow a 24 hour clock. At Citus we use it because UTC 00:00:00 aligns nicely with 4 or 5pm depending on daylight savings time which is a perfect time for happy hour on a Friday.

Shifting back to our earlier example of how many people signed up in the past week. This is great if we want a list of people, but in dealing with dates and timestamps, it's more common if you want to aggregate to form some report. There are two ways you can approach grouping things by date, one is to extract the portions you want—and the other is by truncating the date to lose some of the specificity to it. Personally I usually opt for date_trunc as it does the job nicely. You can truncate a date in a number of ways from year, to month, to day. Using date_trunc we can see that it's simple to aggregate how many users have signed up each week:

SELECT date_trunc('week', created_at),
       count(*)
FROM users
GROUP BY 1
ORDER BY 1;

       date_trunc       | count
------------------------+-------
 2018-06-11 00:00:00-07 |     3
 2018-06-04 00:00:00-07 |     2
(2 rows)

JSON

Postgres has had JSON support for over 5 years now! 5 years. A number of databases have not even existed for 5 years yet. The JSON support in Postgres comes with all the rich transactional guarantees that Postgres brings, powerful indexing, and the ability to join against other data. JSON continues to get even better with each release with more built-in functionality for querying your JSON data.

The most basic JSON operator allows you to extract a value or object within the JSON:

SELECT * 
FROM users ;

-[ RECORD 1 ]-+-------------------------------------------------------
id            | 1
email         | craig@citusdata.com
team_id       | 1
created_at    | 2018-06-16 09:08:40.763473-07
location_data | {"city": "Huntsville", "state": "AL", "country": "US"}


-- Get just the state from the JSON
SELECT email,  
       location_data->'state' 
FROM users;

         email         | ?column?
-----------------------+----------
 jennifer@acmecorp.com | #
 tom@acmecorp.com      | #
 farina@citusdata.com  | "CA"
 peyton@acmecorp.com   | #
 craig@citusdata.com   | "AL"
(5 rows)

To retrieve the above as a string swap the -> operator for ->>.

If you're using JSONB, you may have some unstructured data, another way of thinking about it is you may have a key that has a value or you may have no entry for the key. To clean this up and filter out so we have results where only state exists we can use the ? operator or if we wanted to filter for only people from Alabama we could use the @> operator:

SELECT email,  
       location_data->'state' 
FROM users
WHERE location_data ? 'state';

         email         | ?column?
-----------------------+----------
 craig@citusdata.com   | "AL"
 farina@citusdata.com  | "CA"
(2 rows)

SELECT email,  
       location_data->'state' 
FROM users
WHERE location_data @> '{"state":"AL"}'::jsonb;

        email        | ?column?
---------------------+----------
 craig@citusdata.com | "AL"
(1 row)

If you can dream it, there is a SQL function for it in Postgres

Postgres has existed for more than 20 years, its foundation is solid, but that doesn't mean Postgres is complete. With each new release of Postgres, we see new improvements—including all the new built-in functions to make working with your database easier. The next time you need to perform some operation on data, well, before you jump to writing logic in your application or a custom function, take a look to see if Postgres already has your answer.

Craig Kerstiens

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.