Big News: Hyperscale (Citus) is now generally available as part of Azure Database for PostgreSQL. Learn more.
Today we continue to explore all the powerful and fun things you can do with SQL. SQL is a very expressive language and when it comes to analyzing your data there isn’t a better option. You can see the evidence of SQL’s power in all the attempts made by NoSQL databases to recreate the capabilities of SQL. So why not just start with a SQL database that scales? (Like my favorites, Postgres and Citus.)
Today, in the latest post in our ‘Fun with SQL’ series (earlier blog posts were about recursive CTEs, generate_series, and relocating shards on a Citus database cluster), we’re going to look at window functions in PostgreSQL. Window functions are key in various analytic and reporting use cases where you want to compare and contrast data. Window functions allow you to compare values between rows that are somehow related to the current row. Some practical uses of window functions can be:
- Finding the first time all users performed some action
- Finding how much each users bill increased or decreased from the previous month
- Find where all users ranked for some sub-grouping
The basic structure of a window function in Postgres
Window functions within PostgreSQL have a built in set of operators and perform their action across some specific key. But they can have two different syntaxes that express the same thing. Let’s take a look at a simple window function expressed two different ways:
The first format
SELECT last_name, salary, department, rank() OVER ( PARTITION BY department ORDER BY salary DESC) FROM employees;
The second format
SELECT last_name, salary, department, rank() OVER w FROM employees WINDOW w as (PARTITION BY department ORDER BY salary DESC).
With the first query we can see the window function is inlined, where as the second it is broken out separately. Both of the above queries produce the same results:
last_name | salary | department | rank -----------+---------+--------------+------- Jones | 45000 | Accounting | 1 Williams | 37000 | Accounting | 2 Smith | 55000 | Sales | 1 Adams | 50000 | Sales | 2 Johnson | 40000 | Marketing | 1
Both of these show the last name of employees, their salary, their department—and then rank where they fall in terms of salary in their department. You could easily combine this with a CTE to then find only the highest paying (
where rank = 1) or second highest paying (
where rank = 2) in each department.
What can you do with window functions in Postgres?
Within Postgres there are a number of window functions that each perform a different operation. You can check the PostgreSQL docs for the full list, but for now we’ll walk through a few that are particularly interesting:
rank- As we saw in the earlier example, rank will show where the row ranks in order of the window order.
percent_rank- Want to compute the percent where the row falls within your window order?
percent_rankwill give you the percentage ranking based on your window think of it as ((rank - 1) / (total rows - 1))
lag- Want to do your own operation between rows? Lag will give you the row value
xrows before your current row. Want to the value for future rows? You can use
leadfor that. A great example of this could be computing month over month growth
ntile- Want to compute what percentile values fall in?
ntileallows you to specify a percentile to group buckets into. For 4 quartiles you would use
ntile(4), for percentile of each row you would use