If you want to learn more about Citus on Microsoft Azure, read this post about Hyperscale (Citus) on Azure Database for PostgreSQL.
Common Table Expressions (CTEs) are a powerful construct within SQL. In day to day conversation, you may hear CTEs referred to as
WITH clauses. You can think of CTEs as similar to a view that is materialized only while that query is running and does not exist outside of that query. CTEs can be very useful building blocks for allowing your large SQL queries to be more readable. But, they can also be used recursively allowing you to create some very complex queries without having to drop down to a procedural language like plpgsql or plv8.
Recursive CTEs allow themselves to be called until some condition is met. Let’s jump right in and explore a recursive CTE—a basic one, and using PostgreSQL of course—and then let’s dissect the recursive CTE a bit further to see what we can use it for:
WITH RECURSIVE tens (n) AS ( SELECT 10 UNION ALL SELECT n+10 FROM tens WHERE n+10<= 100 ) SELECT n FROM tens;
When the above is run we’ll get the following result:
n ----- 10 20 30 40 50 60 70 80 90 100 (10 rows)
With the above we could also easily do this with a generate_series. But stick with us and you’ll see the more complex things we can do that aren’t possible with
generate_series. First let’s take a closer look at how it works.
The first part you’ll notice is
WITH RECURSIVE. This tells Postgres the CTE can recursively call itself. The next portion you’ll notice is it takes some parameters into it. In this case
(n), it can also take more than one should you need.
Moving further into the CTE, we have the first query that is executed,
SELECT 10, which generates the first value. The second portion is where all the fun begins. The
UNION ALL specifies that we’re going to return all the records that are produced from the loop. Then
SELECT n+10 FROM tens WHERE n+10<= 100 will keep calling the
tens CTE that is created until the condition is met.
So those are the basics, but the interesting question is: when would you use a recursive CTE? When you have a tree or hierarchical structure to your data, recursive CTEs can make life much easier than loading all your data and running a loop in your code. For applications that deal with ecommerce and shopping categories, recursive CTEs are a great help.
Let’s try to make the benefits of recursive CTEs in Postgres a little more concrete, with an example. First we’re going to create a table of employees, then we’re going to load up some example employees. (Yes, you’ll see some of names from our team here at Citus Data on this list. They were the first names that came to mind.)
CREATE TABLE employees ( id serial, name varchar(255), manager_id int ); INSERT INTO employees VALUES (1, 'Umur', null); INSERT INTO employees VALUES (2, 'Craig', 1); INSERT INTO employees VALUES (3, 'Daniel', 2); INSERT INTO employees VALUES (4, 'Claire', 1); INSERT INTO employees VALUES (5, 'Lindsay', 2); INSERT INTO employees VALUES (6, 'Will', 2); INSERT INTO employees VALUES (7, 'Burak', 2); INSERT INTO employees VALUES (8, 'Eren', 2); INSERT INTO employees VALUES (9, 'Katie', 3); INSERT INTO employees VALUES (10, 'Teresa', 4);
Now I’m going to write a query that gives me all the reports that roll up into a certain org within the company. In this case I’m going to get myself and all of my reports, along with each person’s manager id:
WITH RECURSIVE managertree AS ( SELECT id, name, manager_id FROM employees WHERE id = 2 UNION ALL SELECT e.id, e.name, e.manager_id FROM employees e INNER JOIN managertree mtree ON mtree.id = e.manager_id ) SELECT * FROM managertree; id | name | manager_id ----+---------+------------ 2 | Craig | 1 3 | Daniel | 2 5 | Lindsay | 2 6 | Will | 2 7 | Burak | 2 8 | Eren | 2 9 | Katie | 3 (7 rows)
Next time you need to do some recursive computation across your data, consider doing it directly in SQL as opposed to loading all that data into your application. For further reading consider taking a look at some of these helpful resources:
- PostgreSQL docs on CTEs
- Solving the traveling salesman problem with a CTE
- Getting a tree with all it’s children
As a note CTEs at this time are an optimization fence in PostgreSQL, though there are hopes of that changing in the future. Common Table Expressions are an incredibly useful tool for reporting. At times the readability of CTEs outweighs the performance impact, but consider the trade-offs when using them