Citus 11.3 is out! Now with tenant statistics. Read all about it in Marco's 11.3 blog post. 💥
Written by Craig Kerstiens
March 14, 2018
There are times within Postgres where you may want to generate sample data or some consistent series of records to join in order for reporting. Enter the simple but handy set returning function of Postgres:
generate_series as the name implies allows you to generate a set of data starting at some point, ending at another point, and optionally set the incrementing value.
generate_series works on two datatypes:
Let’s get started with the most basic example:
SELECT * FROM generate_series(1, 5); generate_series ----------------- 1 2 3 4 5 (5 rows)
generate_series pretty straight-forward, but what interesting ways can it be used?
By putting our
generate_series inside a CTE we can easily now generate a set of numbers and then perform some operation against each value. If we want to generate some fake number we can use
random() which generates a random number between 0.0 and 1.0.
WITH numbers AS ( SELECT * FROM generate_series(1, 5) ) SELECT generate_series * random() FROM numbers; ?column? ------------------- 0.87764338683337 0.345125129446387 2.10317854676396 0.937898803502321 1.72822773223743 (5 rows)
Aggregating across some time dimension is a fairly common report. A good example might be new users per week. The simplest way to get this would be by leveraging Postgres
SELECT date_trunc('week', created_at) count(*) FROM users GROUP BY 1 ORDER BY 1;
The issue with the above query arises when two cases are true, first you’re charting your data over time and then two you have a week with no sign-ups. In the case of no sign-ups in a week you’d simply miss the 0 on your graph leaving a misleading impression. To smooth this out we go back to generate series and do an outer join on the week:
WITH range_values AS ( SELECT date_trunc('week', min(created_at)) as minval, date_trunc('week', max(created_at)) as maxval FROM users), week_range AS ( SELECT generate_series(minval, maxval, '1 week'::interval) as week FROM range_values ), weekly_counts AS ( SELECT date_trunc('week', created_at) as week, count(*) as ct FROM users GROUP BY 1 ) SELECT week_range.week, weekly_counts.ct FROM week_range LEFT OUTER JOIN weekly_counts on week_range.week = weekly_counts.week;
Postgres has a wealth of hidden gems within it.
generate_series is just one of the handy built-in features of Postgres. If you know of other novel uses for it we’d love to hear about it @citusdata.