POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Halil Ozan Akgul
May 12, 2023
If you have ever used a database like Postgres, you know how important optimization is. Some minor changes in how the database is setup make all the difference between long waiting times and satisfied customers. And one crucial thing you need before doing the optimization is to monitor and understand how your database is being used.
Citus is an extension to Postgres that improves scalability and parallelization by distributing your Postgres database across nodes in a cluster. The Citus database extension is available as open source and as a managed service on the cloud, as Azure Cosmos DB for PostgreSQL. You can track your Citus nodes and the Postgres tables, but Citus 11.3 takes it one step further and introduces a new way to gather insight on your Citus database with tenant monitoring.
The new Citus 11.3 release, among many other features, introduces a new citus_stat_tenants
view to track your most active tenants, for those with multi-tenant SaaS applications.
In a multi-tenant SaaS application, the same database stores data from the multiple customers of the application. Each of the customers are often referred as tenants. Usually the data for each tenant is handled separately. When you distribute your Postgres table using the Citus create_distributed_table function, every partition key value represents a tenant.
With the new citus_stat_tenants
view you can track:
SELECT
queries,SELECT
, INSERT
, DELETE
, and UPDATE
queries,In this post, you’ll learn how to monitor your top tenants to make more-informed decisions on your database—and you’ll learn how to configure citus_stat_tenants
to best fit your application. This post includes a quickstart and code examples. Let’s explore:
Let us say you have a multi-tenant app with Citus database, similar to the one on our Citus 11.3 blog post, and your customers are companies who have some ad campaigns.
CREATE TABLE companies (id BIGSERIAL, name TEXT);
SELECT create_distributed_table ('companies', 'id');
CREATE TABLE campaigns (id BIGSERIAL, company_id BIGINT, name TEXT);
SELECT create_distributed_table ('campaigns', 'company_id');
Each of the companies will be a tenant and the company ids are the tenant ids, or the tenant attributes.
Now we can run some queries to track with the citus_stat_tenants
view.
Tip: You need to set the citus.stat_tenants_track
to all
in all your nodes to be able to track the statistics. You can put the setting in the postgresql.conf
file.
Let us run some queries.
INSERT INTO companies (name) VALUES ('GigaMarket');
INSERT INTO campaigns (company_id, name) VALUES (1, 'Crazy Wednesday'), (1, 'Frozen Food Frenzy');
INSERT INTO campaigns (company_id, name) VALUES (1, 'Spring Cleaning'), (1, 'Bread&Butter');
INSERT INTO campaigns (company_id, name) VALUES (1, 'Personal Care Refresh'), (1, 'Lazy Lunch');
INSERT INTO companies (name) VALUES ('White Bouquet Flowers');
INSERT INTO campaigns (company_id, name) VALUES (2, 'Bonjour Begonia'), (2, 'April Selection'), (2, 'May Selection');
INSERT INTO companies (name) VALUES ('Smart Pants Co.');
INSERT INTO campaigns (company_id, name) VALUES (3, 'Short Shorts'), (3, 'Tailors Cut');
INSERT INTO campaigns (company_id, name) VALUES (3, 'Smarter Casual');
SELECT COUNT(*) FROM campaigns WHERE company_id = 1;
count
-------
6
(1 row)
SELECT name FROM campaigns WHERE company_id = 2 AND name LIKE '%Selection';
name
-----------------
April Selection
May Selection
(2 rows)
UPDATE campaigns SET name = 'Tailor''s Cut' WHERE company_id = 3 AND name = 'Tailors Cut';
Now you can check citus_stat_tenants to see the statistics.
SELECT tenant_attribute,
read_count_in_this_period,
query_count_in_this_period,
cpu_usage_in_this_period
FROM citus_stat_tenants;
tenant_attribute | read_count_in_this_period | query_count_in_this_period | cpu_usage_in_this_period
------------------+---------------------------+----------------------------+--------------------------
1 | 1 | 5 | 0.000299
3 | 0 | 4 | 0.000314
2 | 1 | 3 | 0.000295
(3 rows)
Now you have insight on your tenants’ activities.
Activity trends of your tenants might change in time. A new tenant could be much more active than a year old one. Instead of tracking the activity of your tenants since they entered the database, citus_stat_tenants
monitors them within time buckets. Each time period’s query and CPU statistics are counted separately. Once a period ends, that period’s numbers are finalized and only stored for one more period. That means citus_stat_tenants
only shows the current and the last period’s statistics.
A period is 60 seconds by default, and you can change it with citus.stat_tenants_period
parameter.
Let us say you waited 60 seconds (1 period) and ran new queries.
SELECT companies.name company, campaigns.name campaign
FROM companies JOIN campaigns ON companies.id = campaigns.company_id
WHERE companies.id = 1 AND campaigns.name LIKE '%zy%';
company | campaign
------------+--------------------
GigaMarket | Crazy Wednesday
GigaMarket | Frozen Food Frenzy
GigaMarket | Lazy Lunch
(3 rows)
DELETE FROM campaigns WHERE company_id = 2 AND name = 'April Selection';
If you check the citus_stat_tenants
view you can see something like:
SELECT * FROM citus_stat_tenants WHERE tenant_attribute = '1' OR tenant_attribute = '2';
-[ RECORD 1 ]--------------+---------
nodeid | 2
colocation_id | 1
tenant_attribute | 1
read_count_in_this_period | 1
read_count_in_last_period | 1
query_count_in_this_period | 1
query_count_in_last_period | 5
cpu_usage_in_this_period | 0.000054
cpu_usage_in_last_period | 0.000299
-[ RECORD 2 ]--------------+---------
nodeid | 1
colocation_id | 1
tenant_attribute | 2
read_count_in_this_period | 0
read_count_in_last_period | 1
query_count_in_this_period | 1
query_count_in_last_period | 3
cpu_usage_in_this_period | 0.000132
cpu_usage_in_last_period | 0.000295
Note that in addition to query and CPU usage, citus_stat_tenants
also includes these columns:
If you want to create graphs and dashboards with citus_stat_tenants
, you can use the “last period” columns for every period. Because “last period” columns have values that cannot be changed, the values in the “last period” columns will be removed after the next period is over.
The new tenant monitoring feature of Citus 11.3 is not meant for investigating each and every tenant in your Citus database.
Rather, the new
citus_stat_tenants
is designed to help you gather information on your most active tenants. The tenant monitoring view will store and show the tenants that ran the most queries the most recently.
If a tenant in your Citus database stops running queries on your distributed Postgres tables, that tenant’s statistics will eventually be removed from the monitor to open space for the more active tenants.
By default, citus_stat_tenants
will show 100 tenants. You can change the number of tenants stored with citus.stat_tenants_limit
parameter.
The tenant monitor will show the top citus.stat_tenants_limit
number of tenants and only for the current and the last period. But to make the monitor and the statistics consistent and more useful, in the background, citus_stat_tenants
stores some more tenants and some extra data related to recency and the number of queries. Citus stores extra data is to make sure scenarios like “an active tenant dropping out of the monitor list, just because the tenant didn’t run queries during one period” does not happen.
You can query citus_stat_tenants
from any node and the view will show the statistics from all the cluster. However, the tracking of the data is done locally on the Citus node where the tenant resides in.
You might not be interested in the whole Citus cluster’s data, if:
If you just want to get the monitoring data from the node you are connected to, you can use the citus_stat_tenants_local
view.
From Citus worker node with node id 1:
SELECT * FROM citus_stat_tenants_local;
-[ RECORD 1 ]--------------+--
colocation_id | 1
tenant_attribute | 2
read_count_in_this_period | 0
read_count_in_last_period | 1
query_count_in_this_period | 1
query_count_in_last_period | 3
cpu_usage_in_this_period | 0.000132
cpu_usage_in_last_period | 0.000295
Note that you can only see the tenants whose shards are in the Citus node with node id 1 and you no longer have the nodeid column that you had in citus_stat_tenants
view.
The citus_stat_tenants
shows the statistics for one period. But if you want to reset the monitor and clear all the data you can use the citus_stat_tenants_reset
function.
SELECT citus_stat_tenants_reset();
SELECT COUNT(*) FROM citus_stat_tenants;
count
-------
0
(1 row)
There is also a function for cleaning local monitor data of a single Citus node, citus_stat_tenants_reset_local()
. Keep in mind that resetting the data for only one Citus node might create inconsistent results across the cluster, so for most cases you should not need the local reset function.
At Citus we try make your database faster every day, which includes giving you the tools to optimize your database performance yourself.
Tenant monitoring is created with this thought in mind. Now that you have the tools to learn more about the activity and usage statistics for the top tenants in your SaaS application, you can make more-informed decisions on how to optimize your Citus database and cluster.
To learn more about Citus 11.3 and tenant monitoring:
And to get started with Citus: