Fun with SQL: generate_series in Postgres

There are times within Postgres where you may want to generate sample data or some consistent series of records to join in order for reporting. Enter the simple but handy set returning function of Postgres: generate_series. generate_series as the name implies allows you to generate a set of data starting at some point, ending at another point, and optionally set the incrementing value. generate_series works on two datatypes:

  • integers
  • timestamps

Let’s get started with the most basic example:

SELECT * 
FROM generate_series(1, 5);
 generate_series
-----------------
               1
               2
               3
               4
               5
(5 rows)

So generate_series pretty straight-forward, but what interesting ways can it be used?

Generating fake data

By putting our generate_series inside a CTE we can easily now generate a set of numbers and then perform some operation against each value. If we want to generate some fake number we can use random() which generates a random number between 0.0 and 1.0.

WITH numbers AS (
  SELECT *
  FROM generate_series(1, 5)
)

SELECT generate_series * random()
FROM numbers;

     ?column?
-------------------
  0.87764338683337
 0.345125129446387
  2.10317854676396
 0.937898803502321
  1.72822773223743
(5 rows)

Pretty weekly reporting with joins

Aggregating across some time dimension is a fairly common report. A good example might be new users per week. The simplest way to get this would be by leveraging Postgres date_trunc function:

SELECT date_trunc('week', created_at)
       count(*)
FROM users
GROUP BY 1
ORDER BY 1;

The issue with the above query arises when two cases are true, first you’re charting your data over time and then two you have a week with no sign-ups. In the case of no sign-ups in a week you’d simply miss the 0 on your graph leaving a misleading impression. To smooth this out we go back to generate series and do an outer join on the week:

WITH range_values AS (
  SELECT date_trunc('week', min(created_at)) as minval,
         date_trunc('week', max(created_at)) as maxval
  FROM users),

week_range AS (
  SELECT generate_series(minval, maxval, '1 week'::interval) as week
  FROM range_values
),

weekly_counts AS (
  SELECT date_trunc('week', created_at) as week,
         count(*) as ct
  FROM users
  GROUP BY 1
)

SELECT week_range.week,
       weekly_counts.ct
FROM week_range
LEFT OUTER JOIN weekly_counts on week_range.week = weekly_counts.week;

What other uses do you have for generate_series?

Postgres has a wealth of hidden gems within it. generate_series is just one of the handy built-in features of Postgres. If you know of other novel uses for it we’d love to hear about it @citusdata.