POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Craig Kerstiens
March 14, 2018
There are times within Postgres where you may want to generate sample data or some consistent series of records to join in order for reporting. Enter the simple but handy set returning function of Postgres: generate_series
. generate_series
as the name implies allows you to generate a set of data starting at some point, ending at another point, and optionally set the incrementing value. generate_series
works on two datatypes:
Let's get started with the most basic example:
SELECT *
FROM generate_series(1, 5);
generate_series
-----------------
1
2
3
4
5
(5 rows)
So generate_series
pretty straight-forward, but what interesting ways can it be used?
By putting our generate_series
inside a CTE we can easily now generate a set of numbers and then perform some operation against each value. If we want to generate some fake number we can use random()
which generates a random number between 0.0 and 1.0.
WITH numbers AS (
SELECT *
FROM generate_series(1, 5)
)
SELECT generate_series * random()
FROM numbers;
?column?
-------------------
0.87764338683337
0.345125129446387
2.10317854676396
0.937898803502321
1.72822773223743
(5 rows)
Aggregating across some time dimension is a fairly common report. A good example might be new users per week. The simplest way to get this would be by leveraging Postgres date_trunc
function:
SELECT date_trunc('week', created_at)
count(*)
FROM users
GROUP BY 1
ORDER BY 1;
The issue with the above query arises when two cases are true, first you're charting your data over time and then two you have a week with no sign-ups. In the case of no sign-ups in a week you'd simply miss the 0 on your graph leaving a misleading impression. To smooth this out we go back to generate series and do an outer join on the week:
WITH range_values AS (
SELECT date_trunc('week', min(created_at)) as minval,
date_trunc('week', max(created_at)) as maxval
FROM users),
week_range AS (
SELECT generate_series(minval, maxval, '1 week'::interval) as week
FROM range_values
),
weekly_counts AS (
SELECT date_trunc('week', created_at) as week,
count(*) as ct
FROM users
GROUP BY 1
)
SELECT week_range.week,
weekly_counts.ct
FROM week_range
LEFT OUTER JOIN weekly_counts on week_range.week = weekly_counts.week;
Postgres has a wealth of hidden gems within it. generate_series
is just one of the handy built-in features of Postgres. If you know of other novel uses for it we'd love to hear about it @citusdata.