Fun with SQL: Common Table Expressions for more readable queries

Written by Craig Kerstiens
August 9, 2018

This week we're continuing our fun with SQL series. In past posts we've looked at generate_series, window functions, and recursive CTEs. This week we're going to take a step backward and look at standard CTEs (common table expressions) within Postgres.

Admittedly SQL isn't always the most friendly language to read. It's a little more friendly to write, but even still not as naturally readable as something like Python. Despite it's shortcomings there it is the lingua franca when it comes to data, SQL is the language and API that began with relational databases and now even non traditional databases are aiming to immitate it with their own SQL like thing. With CTEs though our SQL, even queries hundreds of lines long, can become readable to someone without detailed knowledge of the application.

CTEs (common table expressions), often referred to as with clauses/queries, are essentially views that are valid during the course of a transaction. They can reference earlier CTEs within that same transaction or query essentially allowing you separate building blocks on which you compose your queries. It is of note that CTEs are an optimization boundary, so in cases they may have worse performance than their alternative non-CTE queries. Even still they're incredible useful for readability and should be considered when constructing large complex queries. Let's dig in with an example.

We're going to assume a basic CRM schema where we have organizations, that have contacts which are tied to accounts, and those accounts have opportunities. In this CRM world we want to create a report that has all opportunities that were opened between 30 and 60 days ago, and have a contact that was contacted within the last 30 days, grouped by the sales rep owner. The goal is to see that our sales reps are actively chasing the deals that they said exist.

Query for opportunities created 1-2 months ago

First we're going to construct our query that gives us all opportunities opened in that range:

WITH opp_list AS (
  SELECT opportunities.id as opportunity_id,
         account_id,
         accounts.name as account_name,
         opportunities.amount as opportunity_amount,
         opportunities.created_at as opportunity_created
  FROM opportunities,
       accounts
  WHERE opportunities.created_at <= now() - '30 days'::interval
    AND opportunities.created_at >= now() - '60 days'::interval
    AND opportunities.account_id = accounts.id
)

SELECT *
FROM opp_list

You can see that we start our query with a WITH clause followed by a name we define for the query opp_list. We then put our query in there and later can query that CTE with SELECT * FROM opp_list. This query gives us the list of opportunities, which account they are tied to, and their amount for our given conditions:

 opportunity_id | account_id |  account_name   | opportunity_amount |      opportunity_created
----------------+------------+-----------------+--------------------+-------------------------------
              1 |          1 | ACME Explosives |              45000 | 2018-06-14 09:06:10.06669-07
              4 |          2 | ACME Bread      |              17000 | 2018-07-09 09:06:55.312687-07
              2 |          2 | ACME Bread      |              27000 | 2018-06-26 09:06:27.182375-07
              3 |          3 | ACME Databases  |              35000 | 2018-07-09 09:06:41.615194-07
(4 rows)

Querying for recently contacted contacts

Now we need to construct the next part of our query. Let's first do this in it's own CTE and find all contacts that were emailed within the last 30 days:

WITH recently_contacted AS (
  SELECT email,
         account_id,
         last_contacted
  FROM contacts
  WHERE contacts.last_contacted >= now() - '30 days'::interval
)

SELECT *
FROM recently_contacted

And here we can see we have two contacts we've contacted within the last 30 days:

      email       | account_id |        last_contacted
------------------+------------+-------------------------------
 jane@acme.com    |          2 | 2018-07-12 09:10:45.349093-07
 craig@acmedb.com |          3 | 2018-08-02 09:11:16.624737-07
(2 rows)

Putting our CTE building blocks together

Now we can start to combine our two CTEs to get the intersection of opportunities and who we've contacted in the last 30 days. A note, when chaining together multiple CTEs you don't repeat the WITH keyword instead you continue with a comma and place the next CTE. Let's look at it put together:

WITH opp_list AS (
  SELECT opportunities.id as opportunity_id,
         account_id,
         accounts.name as account_name,
         opportunities.amount as opportunity_amount,
         opportunities.created_at as opportunity_created
  FROM opportunities,
       accounts
  WHERE opportunities.created_at <= now() - '30 days'::interval
    AND opportunities.created_at >= now() - '60 days'::interval
    AND opportunities.account_id = accounts.id
),

recently_contacted AS (
  SELECT email,
         account_id,
         last_contacted
  FROM contacts
  WHERE contacts.last_contacted >= now() - '30 days'::interval
)

SELECT account_name,
       opportunity_created,
       email,
       last_contacted
FROM opp_list
JOIN recently_contacted
       ON opp_list.account_id = recently_contacted.account_id

And with our final result we get 3 open opportunities across 2 accounts that have had some interaction with the rep within the last 30 days:

  account_name  |      opportunity_created      |      email       |        last_contacted
----------------+-------------------------------+------------------+-------------------------------
 ACME Bread     | 2018-06-26 09:06:27.182375-07 | jane@acme.com    | 2018-07-12 09:10:45.349093-07
 ACME Bread     | 2018-07-09 09:06:55.312687-07 | jane@acme.com    | 2018-07-12 09:10:45.349093-07
 ACME Databases | 2018-07-09 09:06:41.615194-07 | craig@acmedb.com | 2018-08-02 09:11:16.624737-07
(3 rows)

CTEs for better readability

CTEs are a powerful construct to help build logic blocks and gradually construct more complex queries. Others that come behind you will appreciate reading your query even if hundreds of lines if you've used CTEs. You will want to be careful if CTEs are heavily used within your application as they are an optimization boundary, but for use cases where readability is as important as a few ms. then give them a strong consideration.

Craig Kerstiens

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.