Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, & Azure Functions

Written by Lucas Borges Fernandes
May 19, 2023

This post by Lucas Borges Fernandes about auto scaling Citus on Azure was originally published on the Azure Cosmos DB Blog.

One of the top Citus features is the ability to run PostgreSQL at any scale, on a single node as well as a distributed database cluster.

As your application needs to scale, you can add more nodes to the Citus cluster, rebalance existing data to the new Postgres nodes, and seamlessly scale out. However, these operations require manual intervention: a) first you must create alerts on metrics, b) then, based on those alerts, you need to add more nodes, c) then you must kick off and monitor the shard rebalancer. Automating these steps will give you a complete auto scale experience—and make your life so much easier.

In this blog post, you will learn how to build a full-fledged auto scaling setup for the Citus database extension running as a managed service on Azure—called Azure Cosmos DB for PostgreSQL. You’ll also learn how you can easily add nodes to the Azure Cosmos DB for PostgreSQL cluster and use any metrics available to trigger actions in your cluster! Let’s dive into the following chapters:

You can also see a demo of the auto scaling setup in action in a talk I gave recently at Citus Con:

video thumbnail for Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, and Azure Serverless YouTube play icon
Figure 1: Video of the conference talk I gave at Citus Con: An Event for Postgres 2023, titled “Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, & Azure Serverless”

Overview of the components for Auto Scaling

The auto scaling architecture this blog proposes combines multiple components that could be either managed in the cloud or on-premises. These components are:

  • Azure Functions
  • Azure Resource Manager (ARM)
  • Grafana
  • Azure Cosmos DB for PostgreSQL (aka Citus on Azure)

For the examples presented in this blog post, you will use the managed version of Citus on Azure, by using the Azure Cosmos DB for PostgreSQL service. The same architecture and steps outlined in this blog post will work for a Citus open source cluster, with the only change being the API used to trigger cluster modifications (e.g., add new nodes, scale up compute power, etc.). In the Azure example in this blog, cluster modifications are done through the Azure Resource Manager (ARM) APIs.

First let’s walk through each of these components. Feel free to skip ahead to Step 1: Creating & populating the cluster if you are already familiar with all of the components.

What are Azure Functions?

Azure Functions is a serverless solution offered by Microsoft that allows you to deploy your applications without having to worry about servers. (I sometimes refer to it conversationally as Azure Serverless.) The cloud infrastructure is all managed by Azure and you only have to focus on the code that matters to you. It is also a great way to save costs and run workloads that don’t require servers up and running 24/7.

These functions can be created in many languages, and Azure also provides a CLI to create boilerplates and deploy these functions in the cloud. See the following link for more details: Azure Functions Core Tools

After deploying the functions, you can use REST APIs to trigger them. It’s important to mention that a function may have a maximum execution time, so it’s advisable to not block them on I/O or other external integrations. You should consider an asynchronous model when possible.

What is Azure Resource Manager (ARM)?

Azure Resource Manager (also known as ARM) is a deployment and management service for Azure. ARM can be used to interact with the Azure cloud to create, update, and delete resources.

ARM uses templates that provide a declarative way to define your resources, and you can apply these through Azure APIs such as REST clients, SDKs, CLI and even in the Azure portal itself.

What is Grafana?

Grafana is an amazing open source tool that helps improve software observability with operational dashboards. It provides analytics and interactive visualizations for data from many sources, including PostgreSQL and Azure Monitor. These visualizations can exist in the form of tables, graphs and even alerts to external applications. One common usage of Grafana is to monitor application health and create alert triggers based on certain events, such as a high CPU usage spike, for example.

Grafana can be installed on-premises but can also be found in the form of a managed service in the cloud, such as the Azure Managed Grafana.

Step 1: Creating & populating the cluster

The first step is to create a Citus database cluster. For this project you will use the Azure Cosmos DB for PostgreSQL managed service, which is powered by the Citus extension. You can think of it as Citus on Azure. The cluster specification used in this example is:

  • Coordinator: 32vCores / 128 GiB RAM
  • Workers: 16vCores / 128 GiB RAM
  • PostgreSQL v15 / Citus 11.3

You can find instructions on how to create an Azure Cosmos DB for PostgreSQL cluster in the following link: Create an Azure Cosmos DB for PostgreSQL cluster.

Once you have provisioned the database cluster, connect to it and execute the following statements:

CREATE TABLE public.companies (
  id bigserial PRIMARY KEY,
  name text NOT NULL,
  image_url text,
  created_at timestamp with time zone NOT NULL,
  updated_at timestamp with time zone NOT NULL
);

CREATE TABLE public.campaigns (
  id bigserial,
  company_id bigint REFERENCES public.companies (id),
  name text NOT NULL,
  cost_model text NOT NULL,
  state text NOT NULL,
  monthly_budget bigint,
  blacklisted_site_urls text[],
  created_at timestamp with time zone NOT NULL,
  updated_at timestamp with time zone NOT NULL,
  PRIMARY KEY (company_id, id)
);

CREATE TABLE public.ads (
  id bigserial,
  company_id bigint,
  campaign_id bigint,
  name text NOT NULL,
  image_url text,
  target_url text,
  impressions_count bigint DEFAULT 0,
  clicks_count bigint DEFAULT 0,
  created_at timestamp with time zone NOT NULL,
  updated_at timestamp with time zone NOT NULL,
  PRIMARY KEY (company_id, id),
  FOREIGN KEY (company_id, campaign_id)
    REFERENCES public.campaigns (company_id, id)
);

CREATE TABLE public.clicks (
  id bigserial,
  company_id bigint,
  ad_id bigint,
  clicked_at timestamp with time zone NOT NULL,
  site_url text NOT NULL,
  cost_per_click_usd numeric(20,10),
  user_ip inet NOT NULL,
  user_data jsonb NOT NULL,
  PRIMARY KEY (company_id, id)
);

CREATE TABLE public.`worker_count`(
  id bigserial PRIMARY KEY,
  created_at timestamp with time zone NOT NULL,
  count bigint
);

SELECT cron.schedule(
    'worker-count',
    '5 seconds',
    $$
        INSERT INTO `worker_count`(created_at, count)
        SELECT
            now(),
            (select
                count(*) as `worker_count`
            from
                pg_dist_node
            where
                nodename ilike 'private-w%' limit 1);
    $$
);

You will notice a worker_count table and a cron job creating a new entry every 5 seconds. This is to keep track of the number of nodes in the cluster and will be used to trigger a shard rebalancing operation after a new worker is added. If you are running a Citus open source cluster, make sure to install the pg_cron extension.

The next step is to distribute the tables:

SELECT create_distributed_table('public.companies', 'id');
SELECT create_distributed_table('public.campaigns', 'company_id');
SELECT create_distributed_table('public.ads', 'company_id');
SELECT create_distributed_table('public.clicks', 'company_id');

The last step is to fill those tables with data. I will omit this step as inserting data should be a straightforward process to do. INSERT..SELECT statements and scripts using benchmarking tools such as pgbench are both options to get data into the newly created cluster.

Step 2: Use Azure Functions & ARM templates to scale out your Citus cluster

In the event of a CPU usage spike, we want to trigger Azure Functions that will:

  1. Use the Azure Resource Manager (ARM) API to add new nodes to the Citus cluster
  2. Connect to the coordinator PostgreSQL server and execute a shard rebalancing operation

Instead of creating 1 function for both steps, you should create 2 separate functions. The reason is because there’s a limit in the execution time of any given function, and adding new nodes can take several minutes before completing. When Grafana detects that a new worker was added, then the 2nd function is triggered. I will explain this in more detail in the Grafana section.

Those functions were created using the Typescript language, and the Azure Function Core Tools was used to create the initial boilerplates and to publish the code to the Azure cloud. The last step returns a URL for each function, and these should be used when configuring the Grafana alerts.

Scale-out function

The scale-out function requires the following dependencies:

import { DefaultAzureCredential } from "@azure/identity"
import { ResourceManagementClient } from "@azure/arm-resources";

The @azure/identity dependency is required for AAD authentication using service principals and @azure/arm-resources for the ARM API used to modify our Citus cluster.

AAD authentication is done through the new DefaultAzureCredential() call and uses environment variables to pull the credentials. Next, you have to pass the resulting object and the subscription id to the ARM API class constructor. The whole process can be seen below.

const azureSubscriptionId = process.env.AZURE_SUBSCRIPTION_ID;
const credential = new DefaultAzureCredential();
const resourceClient = new ResourceManagementClient(credential, azureSubscriptionId);

The next step is to build a new ARM template containing the desired state (i.e., the cluster containing new nodes), and use the ARM API to update the Citus cluster. You can find the template reference in the following link: ARM template for Cosmos DB for PostgreSQL.

The final part of the function is to execute the following lines:

const deployment: Deployment = { properties: { mode: "Incremental", template: template } };
await client.deployments.beginCreateOrUpdate(resource_group, deploymentName, deployment);

The Deployment object contains the ARM template and the beginCreateOrUpdated method is called with the resource group and a custom deployment name. The client is the same we created instantiating the ResourceManagementClient class.

After the beginCreateOrUpdated method is executed, the cluster is resized and the process takes about 10 minutes to finish.

You can find more information about all steps mentioned below:

Shard rebalancing function

The Citus shard rebalancing function is very straightforward and has a PostgreSQL client dependency that can be seen below. Remember, Citus is just plain PostgreSQL, so any client library will work out-of-the-box!

import * as postgres from "postgres";

Next, using a connection string inside an environment variable, the function connects to the server and returns a client object.

const pgClient = postgres(process.env.PG_CONNECTION_URL);

The pgClient object is used to trigger the shard rebalancing Citus UDF.

await pgClient`
    SELECT citus_rebalance_start(rebalance_strategy:='by_shard_count', shard_transfer_mode:='block_writes');
`;

After a few minutes the cluster shards will be fully rebalanced and the CPU usage per worker node will diminish. You can find the PostgreSQL library in the following link: PostgreSQL JS library.

Next, we’ll see how to configure Grafana to trigger the functions we just created using alerts.

Step 3: Configuring Grafana data sources

Grafana has the concept of data sources, which are connectors used to pull data from third-party software. For this project, we are interested in 2 data sources:

  1. Azure Monitor
  2. PostgreSQL

The first is needed to pull metrics from the Azure VMs used to provision the Azure Cosmos DB for PostgreSQL cluster, which by now you’ve figured out is Citus on Azure. The Azure Monitor data source can be used to create time series charts and alerts on CPU, memory, disk usage, etc.

The second data source is responsible for providing an interface to a PostgreSQL database and can be used to create charts and alerts from custom queries. There is one important addition to Citus 11.3 that can be used together with custom PostgreSQL charts and alerts: Tenant monitoring or what I like to call tenant level metrics.

If you have a Citus open source cluster, then you might need a different data source to get the server metrics. One possible solution could be a Prometheus instance connected to your cluster.

Here’s the documentation on how to configure each of these data sources:

Step 4: Creating Grafana alerts

With the Grafana data sources configured, now it’s time to create alerts. These alerts will be used to call HTTP webhooks that in turn will trigger Azure Functions.

A Grafana alert allows you to learn about problems in your systems before or moments after they occur. You can create custom queries on data sources and specify conditions for these alerts to trigger, and when they do, take an action.

An action could be a Slack message, an email, a webhook triggering some software, etc. This is the beauty of Grafana alerts and what makes me love Grafana.

In this project, you will create a CPU usage alert that triggers the Azure Function that adds new nodes to the Citus cluster. You will also create an alert that triggers when the number of worker nodes change and invokes an Azure Function that connects to the Citus cluster and executes a shard rebalancing operation. The end result is the overall decrease of CPU usage in each individual node, making the alerts stop firing.

The threshold for the CPU alert can be 80%, but it could be configured in any way. For the worker count alert, you should create a custom query on the worker_count table for a given time frame, where if the newest value is different from the first value in the series, then the alert is fired. The idea behind is that if the number of worker nodes changes, then a shard rebalancing operation is needed.

Next is the query you can use to create the alert:

SELECT
    to_timestamp(trunc(extract(epoch from created_at))) as time,
    count as worker_count
FROM
    public.worker_count
WHERE
    created_at > now() - '5m'::interval;

You can learn more about how to set up alerts and notifications in the following links:

When configuring the webhook contact points, you should use the Azure Function URLs that were created previously. After that, create a notification policy routing both alerts to their respective contact points.

Step 5: Testing the auto scaling setup

In order to test whether the auto scaling setup worked or not, you can run a pgbench script with the two following distributed queries.

Query 1:

\set company_id1 random(1, 2000)
\set company_id2 random(1, 2000)
\set company_id3 random(1, 2000)
BEGIN;
    SELECT
        ads.company_id,
        ads.id,
        count(*)
    FROM
        public.ads ad
    INNER JOIN
        public.campaigns ca ON ad.campaign_id = ca.id AND ad.company_id = ca.company_id
    WHERE
        ads.company_id IN (
            :company_id1,
            :company_id2,
            :company_id3
        )
    GROUP BY
        1, 2;
END;

Query 2:

\set company_id random(1, 2000)
BEGIN;
    SELECT
        co.id as company_id,
        co.name as company_name,
        SUM(ca.monthly_budget) as monthly_budget
    FROM
        public.companies co
    INNER JOIN
        public.campaigns ca ON co.id = ca.company_id
    WHERE
        co.id = :company_id
    GROUP BY 1;
END;

The following command was issued to execute both queries:

pgbench 'postgres://citus:<password>@<cluster-url>:5432/citus?sslmode=require' -j 4 -c 16 -f query01.sql -f query02.sql -T 2000 --progress=1

After a few seconds, the CPU usage surpassed the 80% threshold and after ~15 minutes the cluster was resized, rebalanced, and the alerts stopped firing.

The same demo can be seen in the Citus Con 2023 talk I gave recently about Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, and Azure Serverless.

Using Grafana & Citus 11.3 to monitor tenants

The new Citus 11.3 release brings many improvements and one of the best (in my opinion), is the support for tenant monitoring with the citus_stat_tenants view. If you are running a multi-tenant SaaS application on Citus, then this view can give very good insights on how each tenant is performing in your cluster.

Some of the attributes available in the citus_stat_tenants view are:

  • Number of read queries for tenant in current period
  • Number of read queries for tenant in the last period
  • Number of read/write queries for tenant in current period
  • Number of read/write queries for tenant in last period
  • Seconds of CPU time spent for tenant in current period
  • Seconds of CPU time spent for tenant in last period

These attributes in the citus_stat_tenants view—in combination with the Grafana integration—can be very powerful. You can use Citus tenant monitoring to detect possible tenant bottlenecks (e.g, outgrowing tenant unbalancing worker nodes, degradation of node performance due to a single tenant, etc). Citus offers a handful of PostgreSQL functions to respond to such events, including: tenant isolation, shard rebalancing, move tenant to different node, and more.

Key takeaways for auto scaling Azure Cosmos DB for PostgreSQL

The main takeaway from this blog is the possibility of extending and creating responsive Azure Cosmos DB for PostgreSQL clusters—or Citus open source clusters—with tenant-level metrics, server metrics, custom queries etc. Auto scaling, even though the focus of this blog, is one of the many possibilities of the setup I described. You can think of it as an event-based cluster management solution!

Because Azure Cosmos DB for PostgreSQL is powered by the Citus open source extension to PostgreSQL—which is not a fork—it works well with native PostgreSQL connectors and libraries. This versatility together with the power of Citus distributed tables makes Azure Cosmos DB for PostgreSQL one of the most powerful and extensible tools out there.

Next steps

I hope you enjoyed the ride and in case you want to learn more, here are some resources:

Lucas Borges Fernandes

Written by Lucas Borges Fernandes

Software engineer on the Postgres team at Microsoft. MsC in Distributed Systems from the Federal University of Uberlandia. Speaker at Citus Con: An Event for Postgres 2023. Loves to play electric guitar. Brazilian.

lucasbfernandes