Fun with SQL: Window functions in Postgres

Written by Craig Kerstiens
June 1, 2018

Today we continue to explore all the powerful and fun things you can do with SQL. SQL is a very expressive language and when it comes to analyzing your data there isn't a better option. You can see the evidence of SQL's power in all the attempts made by NoSQL databases to recreate the capabilities of SQL. So why not just start with a SQL database that scales? (Like my favorites, Postgres and Citus.)

Today, in the latest post in our 'Fun with SQL' series (earlier blog posts were about recursive CTEs, generate_series, and relocating shards on a Citus database cluster), we're going to look at window functions in PostgreSQL. Window functions are key in various analytic and reporting use cases where you want to compare and contrast data. Window functions allow you to compare values between rows that are somehow related to the current row. Some practical uses of window functions can be:

  • Finding the first time all users performed some action
  • Finding how much each users bill increased or decreased from the previous month
  • Find where all users ranked for some sub-grouping

The basic structure of a window function in Postgres

Window functions within PostgreSQL have a built in set of operators and perform their action across some specific key. But they can have two different syntaxes that express the same thing. Let's take a look at a simple window function expressed two different ways:

The first format

SELECT last_name,
       salary,
       department,
       rank() OVER (
        PARTITION BY department
        ORDER BY salary
        DESC)
FROM employees;

The second format

SELECT last_name,
       salary,
       department,
       rank() OVER w
FROM employees
       WINDOW w as (PARTITION BY department ORDER BY salary DESC).

With the first query we can see the window function is inlined, where as the second it is broken out separately. Both of the above queries produce the same results:

last_name  |  salary |  department  | rank
-----------+---------+--------------+-------
Jones      |  45000  |  Accounting  |  1
Williams   |  37000  |  Accounting  |  2
Smith      |  55000  |  Sales       |  1
Adams      |  50000  |  Sales       |  2
Johnson    |  40000  |  Marketing   |  1

Both of these show the last name of employees, their salary, their department—and then rank where they fall in terms of salary in their department. You could easily combine this with a CTE to then find only the highest paying (where rank = 1) or second highest paying (where rank = 2) in each department.

What can you do with window functions in Postgres?

Within Postgres there are a number of window functions that each perform a different operation. You can check the PostgreSQL docs for the full list, but for now we’ll walk through a few that are particularly interesting:

  • rank - As we saw in the earlier example, rank will show where the row ranks in order of the window order.
  • percent_rank - Want to compute the percent where the row falls within your window order? percent_rank will give you the percentage ranking based on your window think of it as ((rank - 1) / (total rows - 1))
  • lag - Want to do your own operation between rows? Lag will give you the row value xrows before your current row. Want to the value for future rows? You can use lead for that. A great example of this could be computing month over month growth
  • ntile - Want to compute what percentile values fall in? ntile allows you to specify a percentile to group buckets into. For 4 quartiles you would use ntile(4), for percentile of each row you would use ntile(100).

Hopefully you'll find window functions as useful as we do here at Citus. If you have questions on using them the PostgreSQL docs are a great resource or feel free to jump into our Slack channel

Craig Kerstiens

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.