POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Craig Kerstiens
September 12, 2016
In any as-a-service business, which bills monthly, a key metric you track is MRR or monthly recurring revenue. It's good practice to have this on a dashboard and check it on a monthly, weekly, or even daily basis. If you have a simple pricing model that has set monthly plans, say like Netflix this is pretty easy to calculate:
SELECT sum(user_subscriptions.price)
FROM user_subscriptions
WHERE users_subscriptions.ended_at IS null;
The above will give you the run rate as of this exact moment in time. It gets a little more complicated to do this in a single query that gives it to you over time. First, you'll want to check the run rate for each day, then you'll want to check what plan they were on at that time. Whether you're Netflix or a smaller site users will upgrade/downgrade–especially in the case of a usage based service such as Citus Cloud.
We track run rate for ourselves as well as month over month growth all in the same query. While we've written more complex SQL before we thought this might be an interesting example of a number of things you can do with SQL and helpful to publicly share for others to benefit from.
First the schema. We have a users and orgs table which overall are pretty un-interesting. The reason they're in the query is we want to filter out internal usage, as we heavily use Citus Cloud to run Citus Cloud and don't want that skewing numbers. The interesting table is billings:
# \d billings
Table "public.billings"
Column | Type | Modifiers
-----------------+-----------+-------------------------------------
id | uuid | not null default uuid_generate_v4()
formation_id | uuid | not null
org_id | uuid | not null
validity_period | tstzrange | not null
price_per_month | integer | not null
You'll notice the validity_period has a tstzrange type. This is a range type which allows you to have a from and a to range value of timestamps directly in a single column.
So on to the query. First I'm going to share the full query, then we can try to break it down further piece by piece.
WITH dates AS (
SELECT generate_series('07/01/2016'::timestamptz, now(), '1 day'::interval) date
),
billing_valid AS (
SELECT price_per_month,
validity_period,
formation_id,
orgs.billing_email
FROM billings,
orgs
WHERE orgs.id = billings.org_id
AND org_id IN (
SELECT id
FROM orgs
WHERE billing_email NOT LIKE '%@mydomain.com'
)
),
billings_by_day AS (
SELECT CASE WHEN
(
lower(validity_period) < dates.date
AND ( upper(validity_period) > dates.date
OR upper(validity_period) is null
)
) THEN price_per_month
ELSE 0
END amount,
dates.date,
validity_period,
formation_id
FROM billing_valid,
dates),
billings_daily_totals AS (
SELECT date,
SUM(amount) as billings
FROM billings_by_day
WHERE amount > 0
GROUP BY date
)
SELECT date,
billings,
round((1.0 - (lag(billings, 28) over w / cast(billings as numeric))) * 100, 1)
FROM billings_daily_totals
WINDOW w as (order by date)
ORDER BY date;
Breaking it down further. First we're making heavy usage of CTEs here to give us some nice composable blocks. This helps at one: making sure we're not making any errors as we construct the SQL and two: allowing it to be more readable as we come back to it later. Within the first CTE we're simple generating a list of dates. This is going to allow us to have a condition (the start of each day) to evaluate if a plan is provisioned at that time or not. You can try running it yourself to see what it generates:
SELECT generate_series('07/01/2016'::timestamptz, now(), '1 day'::interval) date;
Next is a bit of house-keeping. We're going to find only users that we want to track their billings, or as we described earlier filter out all Citus employees usage.
SELECT price_per_month,
validity_period,
formation_id,
orgs.billing_email
FROM billings,
orgs
WHERE orgs.id = billings.org_id
AND org_id IN (
SELECT id
FROM orgs
WHERE billing_email NOT LIKE '%@mydomain.com'
Next is one of those building blocks that makes things easier than doing it in multiple steps. For every billing record we're going to multiply out the number of times that record appears. But we're only going to project out the cost if the Citus Cluster was provisioned before that day and has not been deprovisioned before–in other words at this time was it running:
SELECT CASE WHEN
(
lower(validity_period) < dates.date
AND ( upper(validity_period) > dates.date
OR upper(validity_period) is null
)
) THEN price_per_month
ELSE 0
END amount,
dates.date,
formation_id
FROM billing_valid, dates
An example with data might help. If we had a record that looks like:
# SELECT * from billings limit 1;
-[ RECORD 1 ]---+-------------------------------------------------------------------------------------------------------------------
id | b1e597f8-0dd5-4da7-b63d-b7cc4f39c6f4
formation_id | 571492e5-051b-4da7-9331-d38ced16c3f3
org_id | 9a9dffca-93cb-43c7-8772-924198ede0d4
validity_period | ["2016-07-01 16:32:32.645821+00","2016-07-03 20:25:54.921086+00")
price_per_month | 1800
If we were to run the above query and that was our only record we'd get results similar to:
amount | date | formation_id |
---|---|---|
0 | 2016-07-01 | 571492e5-051b-4da7-9331-d38ced16c3f3 |
1800 | 2016-07-02 | 571492e5-051b-4da7-9331-d38ced16c3f3 |
1800 | 2016-07-03 | 571492e5-051b-4da7-9331-d38ced16c3f3 |
0 | 2016-07-04 | 571492e5-051b-4da7-9331-d38ced16c3f3 |
From here you can see how our next CTE makes it simple to do our aggregations:
SELECT date,
SUM(amount) as billings
FROM billings_by_day
WHERE amount > 0
GROUP BY date
And finally, we take all of those daily run rates and compute the month over month change. Or in this case we use 28 day change since it then falls on the same day of the week to not give us weird jumps across different days of the week. Within that query there's two key things. First we create a window function WINDOW w as (order by date)
. Then we have this line that computes the percentage change: round((1.0 - (lag(billings, 28) over w / cast(billings as numeric))) * 100, 1)
. The key part in here is lag(billings, 28) over w
. This uses one of the built in window function aggregates to say give me the value from 28 rows before.
Hopefully the full query now makes sense and is a good guide for how you can construct some more complex SQL queries that are both useful, but also can be understood and well reasoned about. If you found this post interesting, I'll be speaking and live querying similar concepts at Keep Ruby Weird at the end of October, hope to see you there.