What’s new with Postgres at Microsoft (August 2023)

Written by Claire Giordano
August 31, 2023

This post by Claire Giordano about the Postgres workstreams at Microsoft was originally published on the Microsoft Tech Community Blog.

On one of the Postgres community chat forums, a friend asked me: "Is there a blog post that outlines all the work that is being done on Postgres at Microsoft? It's hard to keep track these days."

And my friend is right: it is hard to keep track. Probably because there are multiple Postgres workstreams at Microsoft, spread across a few different teams.

In this post, you'll get a bird's eye view of all the Postgres work the Microsoft team has done over the last year. Our work includes some pretty significant improvements to the Postgres managed services on Azure, as well as contributions across the entire open source ecosystem—including commits to the Postgres core; new releases to Postgres open source extensions like Citus and pg_cron; plus ecosystem work on Patroni, PgBouncer, pgcopydb. And more.

Here you go, the table of contents. Each of these bullets is a link to make it easy for you to jump straight to the sections that interest you most. :)

Postgres Work at Microsoft (August 2023)
Figure 1: This diagram is a visual overview of the different Postgres workstreams at Microsoft, highlighting some of the work we’ve done over the last year. Of course, it takes a village: for many of the open source bullets in the diagram above, our contributions were not made in isolation—and involved collaboration from developers and community members across the Postgres community, and from other companies, too.

Choice of 2 Azure managed services for PostgreSQL

Many of you choose Postgres when selecting a database for your application. For lots of good reasons. And when you decide to run your application in the cloud on Azure, you have a few choices.

You can of course install and self-manage Postgres on an Azure VM.

But most of you who run your apps in the cloud do so on a managed database service.

Why a managed Postgres service? Primarily so you can outsource your database operations and focus on your application. Which has the benefit of letting somebody else deal with the hardware and operating system, configure high availability, architect for security, manage the backups, and make sure the updates happen.

On Azure, we have two different managed services you can choose from for Postgres. That's what I call the "2 elephants in the room." Choice is good, but it begs the question: how do you choose?

The simplest top-level way to think about the question: "which Postgres service on Azure is right for my application?" is this:

  • Use Azure Database for PostgreSQL – Flexible Server if you just need regular Postgres in the form of a managed service—with top of the line, state of the art cloud infrastructure—and your application's write workload can be met by a single Postgres server. (Portal link)

  • Use Azure Cosmos DB for PostgreSQL if you need a horizontally scalable Postgres service—now or in the future—that can parallelize queries and keep lots of data in memory on a multi-node cluster. (The ability to distribute Postgres across a cluster is powered by the Citus open source extension to Postgres.) (Portal link)

It goes without saying: your application may have unique and very specific needs. So there's always a bit of an "it depends" factor in deciding which database service is right for you.

Azure Database for PostgreSQL – Flexible Server

Flexible Server is the recommended deployment option for the Azure Database for PostgreSQL managed service.

In the Azure docs there is this handy comparison of Flexible Server vs. Single Server, but a simple way to think about the comparison is that Flexible Server—which GA'd in Nov 2021—is the next-generation of the fully-managed Postgres database service on Azure. Whereas the EOL for Single Server was announced earlier this year, and it will be retired on March 28, 2025.

Azure Database for PostgreSQL – Flexible Server is called "Flexible Server" because it supports more configuration options and more extensions than the older Single Server option. If you want to kick the tires, there is a 12-month Free Trial, with 750 hours of a Burstable B1MS instance and 32 GB of storage.

What's new in Flexible Server?

You can get an executive summary view of new features—both GA (generally available) and in Preview (that's Microsoft lingo for "beta")—in this handy Release notes page in the Azure Database for PostgreSQL docs. For now, let's walk through highlights from the last 12 months:

  • Postgres 15 support: Postgres 15 support for Flexible Server in Azure Database for PostgreSQL is now available across all Azure regions. (And the good news is that the engineering team is already working on Postgres 16 support, with a target of making PG16 support available on Flexible Server even sooner for PG16, within 3-4 months of the PG16 release.)

  • In-place major version upgrade of Postgres versions: The docs say it well: the in-place major version upgrade feature was rolled out to simplify the upgrade process and minimize disruption to your app.

  • Azure Active Directory authentication: A key feature for those of you running enterprise applications on Azure, Azure AD authentication support for Flexible Server become generally available (GA) in Jan 2023 and includes some design improvements vs. how Azure AD authentication used to work on Single Server. This short talk by Andrey Chudnovskiy at Citus Con: An Event for Postgres 2023 gives a good overview of Azure AD authentication with Flexible Server.

  • Customer-managed keys: The customer-managed key feature in Azure Database for PostgreSQL is all about enabling you to bring your own key (BYOK) for data protection at rest. Which gives you full control over access to encrypted data at rest, as well as control over key rotation and lifecycle.

  • Read replicas, including Cross-region read replicas: Read replicas give performance and scale benefits to read-intensive workloads. And cross-regional read replicas help with disaster recovery (DR). This read replica feature for Flexible Server was GA'd back in March 2023.

  • 32 TB storage size: In June 2023 we announced that the max storage size for a Flexible Server in Azure Database for PostgreSQL is now 32 TB, double what it was previously.

  • Storage auto-grow (Preview): Aptly named, storage auto-grow is an optional feature that enables you to automatically expand storage size when storage consumption reaches 80% or 90%, depending on the size of the disk.

  • Enhanced Observability using Azure Monitor workbooks: This integration with Azure Monitor workbooks enables you to monitor and troubleshoot in a much easier way. If monitoring and observability is your bailiwick, this Azure docs page explains more.

  • pgvector: In May 2023, we launched support for the open source pgvector extension in Azure Database for PostgreSQL. In these days of Large Language Models (LLMs), pgvector can be used to calculate how similar two items are in use cases like image recognition, recommendation systems, and text classification. This Azure docs page is a useful "how-to" for using pgvector with Azure Database for PostgreSQL – Flexible Server.

Azure Cosmos DB for PostgreSQL (aka Citus on Azure)

Azure Cosmos DB for PostgreSQL is a distributed PostgreSQL service geared toward data-intensive workloads that need the performance and scale of a multi-node, distributed Postgres cluster—either today or in the future.

Workloads that can benefit from our distributed Postgres offering on Azure—powered by the Citus open source extension—are:

  • multi-tenant SaaS applications
  • real-time analytics apps—including timeseries and IOT workloads
  • hybrid transactional and analytical applications (HTAP)
  • microservices

You'll sometimes hear us describe the Azure Cosmos DB for PostgreSQL service as giving you "Postgres at any scale", since you can start small and scale indefinitely by adding nodes to the cluster.

For those of you who want to try the full-featured service—either the single-node configuration or a multi-node cluster—the Quickstart documentation is a good place to get started. There is also a free trial for a single node Azure Cosmos DB for PostgreSQL server (albeit with more limited configuration options.)

For more details than in the bullets below, this Release Notes page on the Azure docs is quite useful.

What's new in Azure Cosmos DB for PostgreSQL in the last year?

  • Azure Cosmos DB for PostgreSQL is the new name for Citus on Azure: The Citus database service on Azure used to be called Hyperscale (Citus), when it was previously part of Azure Database for PostgreSQL.

  • Schema-based sharding for multi-tenant SaaS & microservices: As of Citus 12, there are now 2 ways to shard Postgres in Azure Cosmos DB for PostgreSQL: the traditional row-based sharding that works for all Citus use cases, in which you define a distribution key (aka sharding key)—and the newly-introduced schema-based sharding that is a good fit for multi-tenant SaaS, microservices, and applications that do vertical partitioning.

  • Postgres 15 support: Postgres 15 supported was added into Azure Cosmos DB for PostgreSQL within one week of the Postgres 15 GA release. (Also, since Postgres 16 beta 3 released in early August 2023, our PG16 engineering work is already well underway so we can release support once PG16 is GA...)

  • Citus 12.0: Now available in all supported regionsfor Azure Cosmos DB for PostgreSQL for clusters running Postgres 14 or 15 (and soon, Postgres 16.) This blog post gives an overview of all the goodness in Citus 12, including schema-based sharding for multi-tenant SaaS workloads (and microservices, too.) The Citus 12.0 open-source release became available as fast as the final rounds of testing that are mandatory for a production Azure service could be done, followed by rollout to all supported Azure regions. All it took was a few weeks.

  • Azure AD authentication (Preview): Azure Active Directory (Azure AD) authentication is now supported, in addition to native Postgres authentication. The value prop with AAD is that you can use a single authentication vehicle (AAD) across all Azure services, thus unifying authentication and everything related to it. More details in the docs.

  • Customer-Managed Keys (Preview): An optional feature that is currently in preview within Azure Cosmos DB for PostgreSQL, customer-managed keys let you bring your own key (BYOK) to manage encrypted data at rest in the database.

  • Cross-region read replicas: In Azure Cosmos DB for PostgreSQL, cross-region read replicas enable you to replicate asynchronously from one cluster to another cluster in another region, so you can offload queries to a read replica. Often used for read-heavy workloads. You can also promote a read replica to become an independent read-write cluster, say, as part of a disaster recovery strategy.

  • Compute start/stop: You can stop compute on all nodes in an Azure Cosmos DB for PostgreSQL cluster and then start it again later where you left off. An insanely popular feature. No limits to how long you can stop compute.

  • Larger storage sizes: In Azure Cosmos DB for PostgreSQL, you can now have 4 TB, 8 TB, and 16 TB storage sizes per node in a cluster. And if 16 TB sounds small to you, keep in mind that in a distributed Postgres cluster, your application benefits from the total storage (and total compute, and total memory) across all nodes in a cluster. The Azure docs have a nifty table detailing the compute, memory, and storage sizes for the nodes in a distributed cluster.

  • Burstable Compute Tier: While limited to a single node, burstable compute is popular, dynamic, and gives you the flexibility to start small and grow as you need to. Ideal for development environments and small databases with bursty performance needs. Again, this feature is limited to a single node config.

  • Performance benchmarks from GigaOM: While not a feature, GigaOM released distributed PostgreSQL benchmarks that compared Azure Cosmos DB for PostgreSQL (using Citus distributed tables) to other competing distributed database implementations. The report is titled "GigaOM: Transaction Processing & Price-Performance Testing". The result: Citus is fast. :)

  • Terraform support: Terraform gives you "infrastructure as code" and lets you programmatically manage cluster operations for Azure Cosmos DB for PostgreSQL—as well as all the other Azure cloud services you run. I'm all for programmatic interfaces that help me avoid making mistakes. (Link to announcement.)

  • REST APIs: REST APIs are popular because they are simple and flexible—and the REST APIs for Azure Cosmos DB for PostgreSQL let you manage your resources programmatically.

  • Azure SDKs (Preview): SDK libraries are now available for Azure Cosmos DB for PostgreSQL for .NET, Go, Java, JavaScript, and Python languages. Useful if you want to manage your database clusters using your preferred programming language.

  • Azure CLI = az cosmosdb postgres (Preview): Some people prefer a command line interface (CLI) over a GUI. The az cosmosdb postgres cli is for those of you who prefer a CLI for managing Azure Cosmos DB for PostgreSQL resources.

  • pgvector: In June 2023, we launched support for pgvector in Azure Cosmos DB for PostgreSQL. pgvector is an open source extension to Postgres that enables you to store embeddings in Postgres. These embeddings are vectors created by generative AI models that represent the semantic meaning of text, enabling you to do efficient similarity searches. If you want to try it out, this new Azure docs page for pgvector will get you started.

Contributing to PostgreSQL open source

Since 2019 Microsoft has been hiring and growing a team of PostgreSQL open source developers whose main focus is contributing to the PostgreSQL core.

Why?

In order to thrive, an open source ecosystem needs commercial support as well as volunteer efforts. Even open source developers need to eat! For the Postgres open source ecosystem to flourish, companies like Microsoft need to support the project by funding development in the Postgres core. Which we do.

PostgreSQL is a complex piece of software that runs mission-critical workloads across the globe. To provide the best possible experience on Azure, it follows that we need to thoroughly understand how it works. By having PostgreSQL committers and contributors on our team, they can share knowledge internally across different orgs, or directly answer internal questions regarding incidents or extension development.

Because today's cloud operates at a scale most on-prem solutions never encountered, unique cloud data center problems, often relating to performance, now require special attention. Our in-house team of deep Postgres experts are focused on tackling these cloud-scale issues upstream, in the Postgres core. Another benefit: our team's Postgres expertise gives Azure customers confidence in our cloud database services, too.

Commercial funding of PostgreSQL developers has another benefit: it gives developers the long-term stability to pursue the big things, the groundbreaking changes that are super important to the future. In particular, the Postgres contributor team at Microsoft is focused on some big architectural changes (example: Asynchronous IO) that you wouldn’t be able to do without the funding for a full-time, multi-year effort.

In short: Microsoft is funding PostgreSQL core development to advance the future of Postgres open source and to ensure Postgres works well in a cloud environment such as Azure.

When asked the question of "why fund Postgres open source developers?" sometimes I just give the simplest answer: a rising tide floats all boats in this lovely Postgres ecosystem that we live and work in.

Side-note: we're hiring across several of our Postgres teams right now, and in particular our Postgres contributors team is hiring.

Some of the Microsoft team's contributions to Postgres 16 (and 15)

The bullets below represent highlights of some of the new features our team contributed to Postgres 16 (and some to Postgres 15). Just the highlights, though... there were many other contributions... to list them all would make for a very long post.

Of course, our Postgres developers spend time not only on new feature development, but also on patch review (reviewing other people’s commits)—plus bug fixing, reducing technical debt, and collaborating on design and architecture. Also: knowledge sharing (think: blog posts, conference talks, mentoring, etc.)

Shout-out to everyone who collaborated on this PG work. Our Microsoft team’s work on these Postgres open source features involved collaboration and review and from other Postgres developers from other companies around the globe. If you’re curious as to who was involved, in each Postgres commit message, you can see who:

  • All of the Authors are
  • Who the Reviewers are
  • Who was involved in the Discussion on the pgsql-hackers mailing list

Highlights of Postgres 161 contributions (Postgres 16 release notes)

  • Parallel Full Hash Joins: Full JOINs are commonly used to find the differences between 2 tables. Prior to Postgres 16, parallelism was not implemented for full hash JOINs, which made them slow. (link to commit)

  • pg_stat_io for IO statistics: Allow monitoring of I/O statistics using the new pg_stat_io view. Melanie gave a great talk at Citus Con: An Event for Postgres about pg_stat_io and the additional IO observability it gives you. (link to commit)

  • Logical decoding on physical standbys: If you don't want to put the load of logical decoding on your primary Postgres database server, you can have a physical standby replica do the logical decoding, so you don't have to burden the primary production machine. (link to commit)

  • Faster COPY: Concurrent copy into a single relation is 3X faster (with 16 clients) in Postgres 16 due to the "refactoring relation extension" work. Even single threaded copy is faster. How? New infrastructure was added to make the process of extending tables more efficient. (link to pgsql-hackers discussion)

  • Vacuum & autovacuum improvements: new user-facing options for vacuum & improved vacuum performance, including the addition of BUFFER_USAGE_LIMIT to VACUUM and ANALYZE. Simply put, these improvements make it so you can change your vacuum settings to be more aggressive when speeding up vacuum is a priority over other workloads. (link to buffer_usage_limit commit / link to cost-based delay params commit)

  • Allow logical replication to COPY tables in binary format: This work allows you to use binary format in table synchronization and completes the binary subscription feature. And syncing tables in binary format can significantly speed up the process depending on the data type that's being copied. (link to commit)

  • ORDER BY / DISTINCT aggregate performance improvement: Added the ability for aggregates that have ORDER BY or DISTINCT to use pre-sorted data from an index, if it exists. Previously Postgres would sort the aggregates every time and even if indexes with pre-sorted data existed, Postgres would ignore the indexes. For these scenarios, aggregates are now faster. (link to commit)

  • Reduce memory overhead and increase performance of palloc: Every time we allocate memory in Postgres via palloc, this commit reduces the size of the palloc header. From the commit: “For small to medium sized allocations, this significantly increases the number of chunks that we can fit on a given block which results in much more efficient use of memory.” As a result, as of Postgres 16, Postgres can use less memory to do more stuff in query execution. (link to commit)

  • Groundwork for Direct IO: In Postgres 16 we delivered a developer-only feature to set the stage for asynchronous IO and direct IO work in Postgres 17 and later. Cribbed from Andres Freund's abstract for his PGConf NYC 2023 talk:

    For a few years we have been working towards using asynchronous IO (AIO) and direct IO in Postgres. In this talk I'll describe what using AIO in Postgres looks like, what we already have gotten into Postgres, what we hope to get into Postgres 17, and what we hope to achieve longer term. The goal is to improve throughput, decrease latency, reduce jitter, reduce double buffering, and more.

  • OS platform code cleanup to reduce technical debt: Perhaps not the most exciting work but this bullet represents important work: ripping out all kinds of fallback IFDEF code for dead and ancient operating system (OS) releases. The benefits are reduced technical debt, portability, and assurances that the Postgres test are run on the right/current versions of each OS. As of right now, Postgres supports 10 operating systems. (list of relevant commits)

  • Meson build system: In Postgres 16, the initial version of the Meson build system was added as a first step in the move to a more modern build system. Meson was chosen in good part due to the adoption of other open source projects. Initial help with Meson for Postgres developers is on the Postgres wiki. (link to commit)

Highlights of a few of our Postgres 15 contributions

  • Store statistics in shared memory: this pgstat implementation change in how Postgres statistics are collected has delivered a huge performance improvement—and huge accuracy improvements. The change: statistics are no longer written out to temporary files and are instead stored in shared memory. (link to commit)

  • New, first-ever CI for Postgres: In Postgres 15, continuous integration testing using Cirrus CI was added to the Postgres development process—which has been transformative. Incorporating an automated CI tool into your development process is something that many projects treat as normal but there were challenges getting it going for Postgres. Thanks to the CI, basic mistakes are revealed before you propose a patch. And the CI is good for new people too, since so many things can bite you when making a change: now the testing triggered by the CI helps pinpoint issues earlier in the dev process. (link to commit)

  • IO concurrency for replication: For large and busy databases, Postgres 15 improves crash recovery and physical replication performance. How? With the use of maintenance_io_concurrency. This means a standby/failover node might have an easier time keeping up with the primary—thereby reducing replication lag. This feature is also part of the larger Asynchronous IO (AIO) roadmap and serves a steppingstone toward AIO in recovery. (circular WAL decoding commit // prefech data by WAL commit)

  • Sort performance in PG15: The underlying sort performance is so important in Postgres. The main scenario is in ORDER BY queries, but sort performance also factors into GROUP BY queries, UNION, DISTINCT, queries with a MERGE JOIN, aggregate functions that have an ORDER BY/DISTINCT clause, and queries which contain window functions too. When sort goes faster, your query goes faster too. (blog post by David Rowley on sort performance)

What's new with Citus open source?

Citus is a 100% open source extension to Postgres that gives you the superpower of distributed tables, giving your application more scale and more performance than you get from a single Postgres node.

Or as the engineers on the project often say: "Citus gives you the Postgres you love at any scale."

The titles of the blog posts (below) about new Citus releases in the last year reveal a common theme: Citus is now better than ever for multi-tenant SaaS applications. Which means Azure Cosmos DB for PostgreSQL is too.

New distributed Postgres features added to Citus in the last year

  • Multi-tenant SaaS capabilities in Citus are now state-of-the-art: Multi-tenant software as a service (SaaS) applications are often characterized by fast growth of users and data—combined with high user expectations for snappy performance. So the performance and horizontal scale of distributed Postgres are often a good fit for these demanding multi-tenant SaaS workloads. Advancing the Citus capabilities for SaaS apps has been a key focus in the last year, and we've added schema-based sharding, parallel shard rebalancing, online-tenant isolation, tenant monitoring, and shard rebalancer progress reports, among other things.

  • Schema-based sharding: Citus 12.0 introduced schema-based sharding, a new, second way to shard Postgres with Citus. Sharding by schema removes some of the data modeling prep work often required for traditional Citus row-based sharding—making it easier to adopt Citus for multi-tenant SaaS workloads as well as microservices that use the same database.

  • Parallel shard rebalancing: Traditionally, the rebalancer only moved one shard group at a time. To speed up the rebalance process, especially for workloads with many separate shard groups (e.g. schema-based sharding), Citus can now move many shards in parallel with a configurable number of processes. In addition, rebalancer now runs in the background, so you do not need to keep your session open after starting it.

  • Online tenant isolation: Row-based sharding by tenant ID is a very scalable model for multi-tenant applications because you can easily have millions of tenants. However, some tenants might be much bigger and/or receive many more queries than others and this can lead to performance issues when they share a node with others. Citus already had a tenant isolation function to move a specific tenant (distribution column value) into its own shard group. We improved this function to avoid blocking writes while splitting the shard group, which means you can safely use it in production.

  • Tenant monitoring: The new citus_stat_tenants view introduced in Citus 11.3 gives multi-tenant SaaS apps an overview of the tenants that are using most of the resources in your cluster. The view tracks read query count, total query count, and total CPU usage in seconds. This is useful for both row-based sharding and schema-based sharding scenarios. We didn't stop there though: in the Citus 12.0 implementation of tenant monitoring, significant performance improvements were added—and a new GUC for sampling was added too. Tip: tenant monitoring is not enabled by default—so if you want to use it to investigate, you'll have to turn it on (it’s easy to do.)

  • Shard rebalancer progress reporting: The rebalancer is a long-running process that performs many shard group moves to get the cluster into a balanced state. We added byte-level progress monitoring for shard moves, so you can know how far along the rebalance and each individual shard move is.

  • Patroni HA integration with Citus: Patroni is the most popular high availability (HA) solution for Postgres and in early Feb 2023, Patroni 3.0 added built-in Citus support. That way, you can easily set up high availability for all the nodes in a Citus cluster in any environment.

  • Query-from-any-node added in Citus 11.0: Formerly called Citus MX. The automatic schema and metadata syncing introduced as a GA feature in Citus 11.0 enables you to query the Citus cluster from any worker node, in order to load balance your Postgres queries. This feature is optional and many of you probably won't ever need this capability. However, if you have a demanding, high-throughput workload that needs this type of performance (e.g. large bulk ingest of data), it’s there for you.

  • create_distributed_table_concurrently: One of the unique characteristics of the Citus distributed database is that you can still use the coordinator as a regular PostgreSQL server. That way, you can get started on a single node without any application changes and distribute tables later. With the new create_distributed_table_concurrently function, you can now distribute those tables without taking any write downtime if your app is already in production.

  • Postgres 15 support within 2 business days of PG 15 GA: The Citus engineering team starts working on new major Postgres releases in the early days, that way we can pinpoint any issues early and make sure our support for the new PG release is ready right away. The Citus 11.1.3 open source release came out just 2 business days after the Postgres 15 release with support for PG15 (and the Azure Cosmos DB for PostgreSQL support came out just 1 week after the Postgres 15 GA.)

  • Postgres MERGE support: We added support for the new Postgres MERGE features step by step in a series of Citus open source releases. This blog post walks you through how Citus supports Postgres MERGE as of Citus 12.0.

  • Citus now fully open source: In Citus 11.0, we open sourced the remaining Citus Enterprise features which means that Citus is now 100% open source. If you're curious about why (or what, or how), this blog post titled Distributed Postgres goes full open source will walk you through it.

Postgres ecosystem contributions

Postgres has a rich ecosystem of extensions and tools. Postgres extensions, in particular, have been part of the architecture of Postgres from the very beginning. Highlights of what our team has created &/or maintains are:

  • Patroni: Patroni is the most popular High Availability (HA) solution for Postgres. It's an open source tool that helps you deploy, manage, and monitor HA PostgreSQL clusters using streaming replication. One of our engineers spends the bulk of his time on this project, working as technical lead. This last year, Patroni added built-in support for Citus, too. And Alexander Kukushkin is giving a talk at PGDay Israel 2023 titled "Patroni 3.0: What's New and Future Plans."

  • PgBouncer: Many of you already appreciate the usefulness of connection poolers in Postgres, especially PgBouncer. In the last year, one of our contributions to PgBouncer was a fix to the query cancellation problem—this means you can now run PgBouncer in multi-process mode. At the upcoming PGConf NYC 2023, Microsoft engineer Jelte Fennema will be giving a talk titled "The Future of Connection Pooling: PgBouncer or Something Else?"

  • pgcopydb: An open source tool that automates running pg_dump and pg_restore between two running Postgres servers. The net result: pgcopydb greatly eases and speeds up live migrations between PostgreSQL databases. pgcopydb is implemented for both the base copy of a database and also for Change Data Capture (CDC). The CDC facility uses Postgres logical decoding and the wal2json plugin.

  • pg_cron: An open source extension to Postgres, pg_cron is a simple cron-based scheduler for Postgres—and was created by / is maintained by one of the engineers on our team. One of biggest improvements to pg_cron in the last year is the ability to run a job every X seconds. (In the past the lowest granularity you could run things at was 1X/minute. Now it's 1X/second. Kind of useful.)

  • HyperLogLog (HLL): HLL is a Postgres extension that creates a new data type, which is a HyperLogLog data structure. HLL is an approximation algorithm used to solve the count-distinct problem. Our team did not create this HLL extension originally, but we do maintain it and it's popular among Citus users with real-time analytics workloads.

  • TopN: TopN is an open source Postgres extension used by analytics applications to calculate the top values in a database according to some criteria. TopN was created by one of our engineers and we continue to maintain and support this PG extension.

  • wal-g: WAL-G is one of the most popular continuous archival tools for PostgreSQL, and an important component for our managed services. It was originally created by our team and is an active community OSS project.

  • activerecord-multi-tenant: The activerecord-multi-tenant gem provides an easy way to build multi-tenant Ruby on Rails applications that use row-based sharding, by injecting the appropriate filters into queries based on the active tenant. For schema-based sharding, you can use the acts as tenant gem, on which activerecord-multi-tenant is based.

  • django-multitenant: The django-multitenant library is like activerecord-multi-tenant in the sense that it helps you build multi-tenant apps that build row-based sharding, but for Python and Django. For schema-based sharding, there are several such libraries provided by the community, with django-tenants being the most prominent.

Postgres community work (including a podcast)

Beyond code, there are lots of ways to contribute to the Postgres community. I head up our Postgres and Citus open source community work at Microsoft, so my team in particular focuses on these non-code ways to contribute.

Most of the highlights on the list below focus on knowledge sharing and helping experts and newcomers alike learn more about the world of Postgres and Postgres extensions.

  • Sponsorships of Postgres conferences: We provide financial support to quite a few Postgres conferences as part of our mission to support the community. Like many of the PG conference-goers, we appreciate all the learning that gets fostered at these conferences (and the people you can meet.) In the last year we sponsored PGConf NYC, PGConf EU, PGCon, PGDay Chicago, and we're on track to sponsor more this year.

  • Path To Citus Con, a podcast for developers who love Postgres: People love this new podcast, probably because we have had some a-ma-zing guests. In Path To Citus Con, we explore the human side of open source, Postgres, databases, and the many PG extensions (including Citus.) Pino de Candia and I co-host. This index shows you all the episodes, which you can also find and listen to on any/all of the podcast platforms. Gratitude to our guests, past and future.

    You can subscribe to the calendar of future live recordings, so you can listen live and be part of the parallel text chat. It's quite fun to participate.

    Upcoming guests include PostGIS experts Paul Ramsey and Regina Obe on Wed September 6th 2023 to talk about why people care about PostGIS.

    On Wed October 11th 2023, we'll do the podcast recording with Andres Freund and Heikki Linnakangas, to talk about how they got started as developers and in Postgres!

  • Organize free & virtual event, Citus Con: An Event for Postgres: Back in 2022 we hosted the inaugural Citus Con: An Event for Postgres. The goal was to create a series of virtual Postgres talks from users, customers, developers, community members… so we could learn from each other. And to do so in an inclusive, ungated way. (Not everybody can travel to in-person conferences!) You can check out the 38 amazing talks from 2022; and the 37 equally amazing talks from 2023. Or you can sign up for news about next year's event.

  • So many blog posts: With blogs, we share tips, updates, and deep dives. Our team publishes our posts on Microsoft Tech Community, on the Azure Cosmos DB Blog, and on the Citus Open Source Blog. (And yes, we syndicate our open source posts to Planet Postgres.)

  • Conference talks at PG events: During Covid obviously our conference speaking plummeted (other than virtual events) but now things are getting back to normal: in the last year, our engineers and PMs have given 29 conference talks at community events for Postgres. And at the upcoming PGConf NYC 2023, 7 of us will be presenting, along with many other brilliant speakers from the Postgres world.

  • Livestreams & demos: Since Citus 11.0, we've been hosting livestreams of a "release party" for each new Citus open source release, with a few live demos and discussion about the new distributed Postgres capabilities. We stole the idea from the VS Code team. You can find the livestream recordings for Citus 12.0, Citus 11.3, 11.2, and more online on YouTube.

  • Citus Newsletter: Our monthly newsletter with thousands of subscribers (you can subscribe too) includes links to the latest blog posts and releases of the Citus extension—as well as links to important and interesting Postgres open source news, such as the recent release of Postgres 16 RC1.

  • Citus Slack for Q&A: If you're a Citus user (or even a Citus fan, or a future Citus user), you can join our Slack for Q&A about the Citus extension aka distributed PostgreSQL.

  • PGSQL Phriday contributions: PGSQL Phriday is an awesome community blogging initiative created by Ryan Booz at Redgate. The idea—cribbed from the SQL database community—is that lots of people across the community blog about the same topic on the same day each month. Of late, we've started to contribute, including this pgBadger blog series organized by Alicja Kucharczyk and my blog post on Partitioning vs. sharding in Postgres and Citus.

The "it takes a village" quote applies to Postgres, too

This post shares some of the new capabilities that our team has worked on over the last year in our many different Postgres workstreams here at Microsoft.

But it must be said again that the PostgreSQL database is a global open source project with contributors from all over the world. Our team at Microsoft is not alone in our effort to keep advancing Postgres as the world's most advanced open source relational database—this is a shared, cross-company, community mission.

If you don't use Postgres (or work on it) yet, maybe give it a try. You might get hooked!


Acknowledgements

This blog post highlights the collaboration and the work of a lot of people, inside and outside of Microsoft. Without the efforts of these open source and Azure teams, I wouldn’t have had anything to write about. Shout-out and a big +1 to reviewers who gave me input and feedback on earlier drafts of this blog post (and +1 to Robert Treat for inspiring me to write it.) Thank you!

Adam Wolk, Alexander Kukushkin, Alicja Kucharczyk, Charles Feddersen, Daniel Gustafsson, David Rowley, Jelte Fennema, Krishnakumar Ravi, Marco Slot, Melanie Plageman, Melih Mutlu, Nazir Bilal Yavuz, Nik Larin, Pino de Candia, Sunil Agarwal, Teresa Giacomini, Thomas Munro, Utku Azman.


Footnotes

  1. At the time of blog post publication, Postgres 16 RC1 (release candidate 1) has been released but the Postgres 16 release is not final. All the features listed as being contributed to PG16 are only in Postgres 16 RC1 at this point in time. Things can still change between now and the final Postgres 16 release. 
Claire Giordano

Written by Claire Giordano

Head of open source community efforts for Postgres at Microsoft. Alum of Citus Data, Amazon, Sun Microsystems, and Brown University CS. Conference speaker at PGConfEU, FOSDEM, PGConf NYC, Nordic PGDay, pgDay Paris, PGDay Chicago, Swiss PGDay, & Citus Con. Talk Selection Team chair for POSETTE: An Event for Postgres. Loves sailing in Greece.

@clairegiordano @clairegiordano@hachyderm.io clairegiordano