Citus 11.3 is out! Now with tenant statistics. Read all about it in Marco's 11.3 blog post. 💥
Written by Halil Ozan Akgul
May 12, 2023
If you have ever used a database like Postgres, you know how important optimization is. Some minor changes in how the database is setup make all the difference between long waiting times and satisfied customers. And one crucial thing you need before doing the optimization is to monitor and understand how your database is being used.
Citus is an extension to Postgres that improves scalability and parallelization by distributing your Postgres database across nodes in a cluster. The Citus database extension is available as open source and as a managed service on the cloud, as Azure Cosmos DB for PostgreSQL. You can track your Citus nodes and the Postgres tables, but Citus 11.3 takes it one step further and introduces a new way to gather insight on your Citus database with tenant monitoring.
The new Citus 11.3 release, among many other features, introduces a new
citus_stat_tenants view to track your most active tenants, for those with multi-tenant SaaS applications.
In a multi-tenant SaaS application, the same database stores data from the multiple customers of the application. Each of the customers are often referred as tenants. Usually the data for each tenant is handled separately. When you distribute your Postgres table using the Citus create_distributed_table function, every partition key value represents a tenant.
With the new
citus_stat_tenants view you can track:
In this post, you’ll learn how to monitor your top tenants to make more-informed decisions on your database—and you’ll learn how to configure
citus_stat_tenants to best fit your application. This post includes a quickstart and code examples. Let’s explore:
Let us say you have a multi-tenant app with Citus database, similar to the one on our Citus 11.3 blog post, and your customers are companies who have some ad campaigns.
CREATE TABLE companies (id BIGSERIAL, name TEXT); SELECT create_distributed_table ('companies', 'id'); CREATE TABLE campaigns (id BIGSERIAL, company_id BIGINT, name TEXT); SELECT create_distributed_table ('campaigns', 'company_id');
Each of the companies will be a tenant and the company ids are the tenant ids, or the tenant attributes.
Now we can run some queries to track with the
Tip: You need to set the
all in all your nodes to be able to track the statistics. You can put the setting in the
Let us run some queries.
INSERT INTO companies (name) VALUES ('GigaMarket'); INSERT INTO campaigns (company_id, name) VALUES (1, 'Crazy Wednesday'), (1, 'Frozen Food Frenzy'); INSERT INTO campaigns (company_id, name) VALUES (1, 'Spring Cleaning'), (1, 'Bread&Butter'); INSERT INTO campaigns (company_id, name) VALUES (1, 'Personal Care Refresh'), (1, 'Lazy Lunch'); INSERT INTO companies (name) VALUES ('White Bouquet Flowers'); INSERT INTO campaigns (company_id, name) VALUES (2, 'Bonjour Begonia'), (2, 'April Selection'), (2, 'May Selection'); INSERT INTO companies (name) VALUES ('Smart Pants Co.'); INSERT INTO campaigns (company_id, name) VALUES (3, 'Short Shorts'), (3, 'Tailors Cut'); INSERT INTO campaigns (company_id, name) VALUES (3, 'Smarter Casual'); SELECT COUNT(*) FROM campaigns WHERE company_id = 1; count ------- 6 (1 row) SELECT name FROM campaigns WHERE company_id = 2 AND name LIKE '%Selection'; name ----------------- April Selection May Selection (2 rows) UPDATE campaigns SET name = 'Tailor''s Cut' WHERE company_id = 3 AND name = 'Tailors Cut';
Now you can check citus_stat_tenants to see the statistics.
SELECT tenant_attribute, read_count_in_this_period, query_count_in_this_period, cpu_usage_in_this_period FROM citus_stat_tenants; tenant_attribute | read_count_in_this_period | query_count_in_this_period | cpu_usage_in_this_period ------------------+---------------------------+----------------------------+-------------------------- 1 | 1 | 5 | 0.000299 3 | 0 | 4 | 0.000314 2 | 1 | 3 | 0.000295 (3 rows)
Now you have insight on your tenants’ activities.
Activity trends of your tenants might change in time. A new tenant could be much more active than a year old one. Instead of tracking the activity of your tenants since they entered the database,
citus_stat_tenants monitors them within time buckets. Each time period’s query and CPU statistics are counted separately. Once a period ends, that period’s numbers are finalized and only stored for one more period. That means
citus_stat_tenants only shows the current and the last period’s statistics.
A period is 60 seconds by default, and you can change it with
Let us say you waited 60 seconds (1 period) and ran new queries.
SELECT companies.name company, campaigns.name campaign FROM companies JOIN campaigns ON companies.id = campaigns.company_id WHERE companies.id = 1 AND campaigns.name LIKE '%zy%'; company | campaign ------------+-------------------- GigaMarket | Crazy Wednesday GigaMarket | Frozen Food Frenzy GigaMarket | Lazy Lunch (3 rows) DELETE FROM campaigns WHERE company_id = 2 AND name = 'April Selection';
If you check the
citus_stat_tenants view you can see something like:
SELECT * FROM citus_stat_tenants WHERE tenant_attribute = '1' OR tenant_attribute = '2'; -[ RECORD 1 ]--------------+--------- nodeid | 2 colocation_id | 1 tenant_attribute | 1 read_count_in_this_period | 1 read_count_in_last_period | 1 query_count_in_this_period | 1 query_count_in_last_period | 5 cpu_usage_in_this_period | 0.000054 cpu_usage_in_last_period | 0.000299 -[ RECORD 2 ]--------------+--------- nodeid | 1 colocation_id | 1 tenant_attribute | 2 read_count_in_this_period | 0 read_count_in_last_period | 1 query_count_in_this_period | 1 query_count_in_last_period | 3 cpu_usage_in_this_period | 0.000132 cpu_usage_in_last_period | 0.000295
Note that in addition to query and CPU usage,
citus_stat_tenants also includes these columns:
If you want to create graphs and dashboards with
citus_stat_tenants, you can use the “last period” columns for every period. Because “last period” columns have values that cannot be changed, the values in the “last period” columns will be removed after the next period is over.
The new tenant monitoring feature of Citus 11.3 is not meant for investigating each and every tenant in your Citus database.
Rather, the new
citus_stat_tenantsis designed to help you gather information on your most active tenants. The tenant monitoring view will store and show the tenants that ran the most queries the most recently.
If a tenant in your Citus database stops running queries on your distributed Postgres tables, that tenant’s statistics will eventually be removed from the monitor to open space for the more active tenants.
citus_stat_tenants will show 100 tenants. You can change the number of tenants stored with
The tenant monitor will show the top
citus.stat_tenants_limit number of tenants and only for the current and the last period. But to make the monitor and the statistics consistent and more useful, in the background,
citus_stat_tenants stores some more tenants and some extra data related to recency and the number of queries. Citus stores extra data is to make sure scenarios like “an active tenant dropping out of the monitor list, just because the tenant didn’t run queries during one period” does not happen.
You can query
citus_stat_tenants from any node and the view will show the statistics from all the cluster. However, the tracking of the data is done locally on the Citus node where the tenant resides in.
You might not be interested in the whole Citus cluster’s data, if:
If you just want to get the monitoring data from the node you are connected to, you can use the
From Citus worker node with node id 1:
SELECT * FROM citus_stat_tenants_local; -[ RECORD 1 ]--------------+-- colocation_id | 1 tenant_attribute | 2 read_count_in_this_period | 0 read_count_in_last_period | 1 query_count_in_this_period | 1 query_count_in_last_period | 3 cpu_usage_in_this_period | 0.000132 cpu_usage_in_last_period | 0.000295
Note that you can only see the tenants whose shards are in the Citus node with node id 1 and you no longer have the nodeid column that you had in
citus_stat_tenants shows the statistics for one period. But if you want to reset the monitor and clear all the data you can use the
SELECT citus_stat_tenants_reset(); SELECT COUNT(*) FROM citus_stat_tenants; count ------- 0 (1 row)
There is also a function for cleaning local monitor data of a single Citus node,
citus_stat_tenants_reset_local(). Keep in mind that resetting the data for only one Citus node might create inconsistent results across the cluster, so for most cases you should not need the local reset function.
At Citus we try make your database faster every day, which includes giving you the tools to optimize your database performance yourself.
Tenant monitoring is created with this thought in mind. Now that you have the tools to learn more about the activity and usage statistics for the top tenants in your SaaS application, you can make more-informed decisions on how to optimize your Citus database and cluster.
To learn more about Citus 11.3 and tenant monitoring:
And to get started with Citus: