POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by M. Ozan Saka
November 29, 2022
This post by Ozan Saka about the PostgreSQL 15 support on Azure was originally published on the Azure Cosmos DB Blog.
As you may have heard, we recently made PostgreSQL 15 generally available in Azure Cosmos DB for PostgreSQL within just 1 week of the PostgreSQL 15 release. The Postgres 15 version is available for you whether you need to create a new cluster in Azure Cosmos DB for PostgreSQL, or upgrade your existing cluster. (Note: you can do in-place major version upgrades in Azure Cosmos DB for PostgreSQL.) And the PostgreSQL 15 support is available in all Azure regions that support Azure Cosmos DB for PostgreSQL.
You may be surprised since it's usually not the norm for a managed database service to start supporting the new major PostgreSQL version that early... This post will walk you through what's going on behind the scenes that enables us to do such a feat. Some background before diving in:
Azure Cosmos DB for PostgreSQL is powered by native Postgres and Citus open source—and enables you to run PostgreSQL at any scale, from a single node to a large, distributed cluster. Customers can also scale out as much as they want depending on their needs with many additional features. The Hyperscale (Citus) managed service recently moved into Azure Cosmos DB family (more info on the launch of Azure Cosmos DB for PostgreSQL in this blog post) and with that introduced try Azure Cosmos DB for PostgreSQL for free where you can try out PostgreSQL 15 with Citus 11.1.
In this blog post, you will learn about how these 3 components in Azure Cosmos DB for PostgreSQL were integrated with PostgreSQL 15 and how we did it in such a short amount of time:
Our service architecture has stayed mostly the same since Ozgun Erdogan's How We Shipped PostgreSQL 14 on Azure Within One Day of its Release - Microsoft Community Hub post last year. In the "Releasing a new PostgreSQL version" section of Ozgun's blog post, there is a good starting point for understanding what we have behind the scene and some of our best practices like testing and Safe Deployment Practice/Policy (SDP). Quoting the following bit to give a brief summary of the design:
"In our architecture, the control plane is responsible for the business logic for managing Postgres/Citus databases. This logic includes periodic health checks, high availability and failover, backup and restore, read replicas, regular maintenance operations, and others. The data plane is solely responsible for running the database. As such, the data plane contains almost nothing else other than stock PostgreSQL and its extensions".
More info regarding the Safe Deployment Practices can be found here. Usual deployment cycle looks like the general definition made there which is:
The Azure Cosmos DB for PostgreSQL service team also made the call to change deployment scheduling from bi-weekly to weekly recently, in order to be more agile and to make the cross-team collaborations easier—with the end goal of delivering PostgreSQL 15 within the service sooner.
On top of having 100% unit test code coverage and E2E test pipeline (using Azure Pipelines) that serve as a gatekeeper for our codebase, we also have active monitoring in place that continuously runs against the managed Azure Cosmos DB for PostgreSQL service directly in each of the Azure regions we support. The idea is to execute various E2E scenarios in production environment by only using service endpoints via Azure Resource Manager (ARM).
Each job also produces telemetry (logs and metrics) and we have internal monitors in place where it would generate incident tickets for our on-call engineer whenever it detects an issue. This setup comes in handy in multiple places:
As you can imagine, having this many layers of "testing" on top of having a solid architecture with separation of concerns and good coding practices gives us more confidence and makes it easier to introduce important changes to our service in a short time.
For this major release, we decided to prepare and deploy beta and release candidate versions of PostgreSQL 15 internally, so that the integration between Azure Portal, the Citus database extension, and Azure Cosmos DB for PostgreSQL service teams would go smoothly during the final phase when we got closer to the actual release.
There were 3 steps that needed to be done before we shipped an internal preview version for internal QA purposes:
Fixing any breaking changes introduced by PG15. The first one was cleaning up any usage of non-existing GUCs from our service since starting with PG15 doing that would give an error. The second part was removal of long-deprecated exclusive backup mode.
Temporarily excluding extensions that didn't have PG15 supported versions available yet.
Creating an internal Azure Feature Exposure Control (AFEC) flag and allowing PG15 interaction only to subscriptions that have it enabled. Will give more details about AFEC under Azure Portal section.
Azure Portal is one of the ways customers can interact with our service backend as seen in the figure below and can mainly be considered as our service's front. The way Azure Portal communicates with our service is via ARM (Azure Resource Manager). ARM acts as management layer that also handles authentication/security of incoming requests
In our case, if you tried to create a new Azure Cosmos DB for PostgreSQL cluster on (quick start on how), Azure Portal then would create ARM template with all the information you specified converted to suitable syntax. ARM then would redirect the request to that region's control plane. Upsert requests are defined in async nature and tracked by their "request id". Azure Portal continuously checks the status of the async operation to give users visual status update.
Going back to the topic of enabling PostgreSQL 15 support, we had two goals on the Azure Portal front
Supporting beta and release candidate versions of PG15 for internal testing; meaning we wanted to have certain internal development subscriptions to have access to it both from resource creation UI and in-place upgrade blade in Azure Portal. This is where the Azure Feature Exposure Control (AFEC) came in handy.
We created a new internal feature for preview purposes, registered internal test subscriptions to that feature, then had a logic deployed to Control Plane that enabled PG15 preview version depending on the feature state. In the end PG15 option only showed up in Azure Portal resource creation page for those internal subscriptions.
Having the ability to make PG15 publicly available at the same time in all regions at the time we wanted. This is important for multiple reasons; due to nature of our SDP deployment process, new deployment payload would be finished on some regions earlier than others (in some cases even multiple days earlier) and we also wanted to do validations in each region after the deployment. This was done in a similar feature flag way where after we gave the final confirmation Portal Team enabled PG15 for every subscription
Azure Cosmos DB for PostgreSQL is mainly powered by Citus extension but we also provide a lot of extensions, some enabled by default and some set as optional. Optional ones can be enabled from the database itself by running the following after connecting with admin user (here postgis can be changed with any other):
SELECT create_extension('postgis');
Some of these extensions would require fixes in order to support the new major PostgreSQL 15 version depending on the breaking changes; which would mean getting our hands on the packages for the newest versions of these Postgres extensions as early as possible.
The work here was divided into two parts.
Extensions other than Citus in this area were very easily updated to support PG15 (usually within a day of effort) and new versions were tagged and released. This means most of the Citus Engine team's effort went into updating Citus open source. Enabling the new PostgreSQL major version in Citus extension can be divided into 3 parts.
This time for PG15, we also decided to generate nightly packages during development which would mean we had the ability to test the latest state on the field in our service under the umbrella of "internal preview version release" I mentioned before. Doing that gave us another confidence boost and helped iron out some issues earlier otherwise would be found much later the development cycle.
The data plane for the Azure Cosmos DB for PostgreSQL service currently uses CentOS on the server VMs, and most of the extension package management is done by using YUM. External packages are managed by and directly consumed from Public PostgreSQL Yum Repository. Some of the PostgreSQL 15 compatible rpm packages for the extensions were shipped a few days after the PostgreSQL 15 release thus we waited for all of them to be ready. It's also important to consume and install necessary debug info packages for service management.
PostgreSQL 15 was officially released on October 13, Thursday. At that point we had already finished and deployed most of the necessary changes in our service code. The remaining bits were disabling the AFEC feature check and adding the final extension versions to the service. It turned out the final PG15 GA version had some breaking changes for Citus extension, so the Citus database engine team started working on an update. On the next day, the Citus engine team announced the public release of version 11.1.3 which had PG15 GA support. At a similar time towards Friday evening, the latest PostGIS package that supported PG15 was also shipped to PostgreSQL Yum Repository.
We started the safe deployment process on Monday which finished on Wednesday for every region. After finishing the final tests/validations and doing necessary cleanups for some of our existing customer clusters so they can upgrade to PG15 whenever they want, we enabled the feature on Azure Portal for everyone on October 20 Thursday. Following that was the announcement post on the Azure Cosmos DB Blog on Friday October 21st.
To sum things up: we made Postgres 15 generally available in Azure Cosmos DB for PostgreSQL within a week of the PG 15 GA—and it's super easy to get a PG15 cluster or upgrade an existing one. If you want to scale out PostgreSQL 15 on cloud, you can create a new Azure Cosmos DB for PostgreSQL via Azure Portal.
Existing Azure Cosmos DB for PostgreSQL customers also can upgrade older PG versions in place via the in-place upgrade option in the Azure Portal as well. With minimum downtime, you can upgrade your existing clusters from any supported major version to any supported major version in a single operation. Find out more in the docs for in-place major version upgrade support.
Please let us know what you think via email at Ask Azure Cosmos DB for PostgreSQL.
Azure Cosmos DB is a fast, distributed NoSQL and relational database built for applications of any size or scale.