POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Jelte Fennema-Nio
September 12, 2022
A few months ago we made Citus fully open source. This was a very exciting milestone for all of us on the Citus database engine team. Contrary to folks who say that Postgres is a monolith that can’t scale—Postgres in fact has a fully open source solution for distributed scale, one that’s also native to Postgres. It’s called Citus! This post will go into more detail on why we open sourced our few remaining enterprise features in Citus 11, what exactly we open sourced, and finally what it took to actually open source our code. If you’re more interested in the code instead, you can find it in our GitHub repo (feel free to give the Citus project a star.)
One of the reasons we open sourced the last few enterprise features in Citus 11 is that our business model has changed. When Citus was first started back in ~2011, the business model consisted of selling enterprise licenses and support contracts. In 2016 we un-forked Citus from Postgres and open sourced the bulk of Citus. After doing that, we differentiated our enterprise-licensed software from the open source version by including a few extra closed source enterprise features. But over time our business model has moved away from selling enterprise licenses.
Currently our business model revolves around our managed service for the Citus database on Azure. As you might imagine, our managed service builds on top of Citus open source by adding in the “managed” features, aka the features that save you time and make it so you no longer have to worry about your database.
With the Azure service, you can create and scale a Postgres cluster with the click of a button; your Postgres settings are already tuned to get extra performance out of the hardware; you get automatic backups, from which you can restore with ease; and if a node in your cluster crashes, you automatically failover to another one assuming you enabled High Availability (HA). You’ll also get easy integrations with other Azure cloud services like ADF, Azure Stream Analytics, Azure Kubernetes Service, App Service, and more... And if you ever run into an issue, you can always reach out to the super-knowledgeable Azure support team.
As a result of this change in business model, we started to wonder: If customers are primarily paying us for the managed service, does that mean that we could make Citus completely open source? And what are the advantages of making Citus completely open source?
Even if there were no big downsides to open sourcing everything, then open sourcing needed to have some advantages for us to spend the effort to move away from the status quo.
As you might imagine, there are a multitude of benefits to making Citus completely open source.
Probably the most obvious group of people who benefit from completely open sourcing Citus are those of you who already use the open source version of Citus. If you’re part of this group, you suddenly get extra features by simply upgrading to Citus 11. And who doesn't like lots of new features?
But if you’re not yet using Citus, this release could be the turning point too. With these extra features like the online shard rebalancer and better user management, we expect more of you to give Citus a try. For some of you Citus on Azure may not fit into your plans yet—or perhaps you chose not to use Citus because it was missing some features that were critical for you. Well, not anymore! We have taken away that barrier!
We would love as many people as possible, including you, to use Citus to build and deploy applications—whether you’re using Citus on Azure or Citus open source. Why? Because of the following reasons:
If you were already using the Citus managed service on Azure then the newly open-sourced features are already available to you there. But if you’re a developer, then Azure likely isn’t the only place you used Citus. You have probably also installed the open source version of Citus on your dev machine, to develop and test your application. So in the past, the environment on your dev machine would differ slightly from your production environment. Which is not desirable, since many developers like to keep the development environment as close to production as possible, so you can catch the most bugs.
By open sourcing the remaining Citus features in Citus 11, you now have complete functional parity between Citus in the cloud and on your laptop.
Lastly, but not unimportantly, by open sourcing all of Citus we significantly increase the happiness and productivity of our own developers (including myself).
Prior to Citus 11, we had two git repositories. A public GitHub repo, for the open source version of Citus and a private repo that contained the "enterprise" version of Citus.
So, before Citus 11 each new Citus release required a significant chunk of developer time to be spent on the repetitive task of merging the changes from the public repo into the private repo. Often resulting in annoying merge conflicts, or slight differences in behavior during tests. This wasn’t the favorite part of the job for any of us. And it was also time consuming. Now that everything is open source, my colleagues and I can focus more on the work that we love: Improving Citus and distributing PostgreSQL!
While some of my teammates work on our Azure managed services for PostgreSQL and for Citus, a fair number of my engineering colleagues here at Microsoft spend the bulk of their time working as committers of the Postgres open source project. We also maintain other open source extensions to Postgres such as pg_cron, hyperloglog, and TopN. And even though Citus was already mostly open source, we felt that we could show our love for open source even better by making Citus entirely open source.
Most of the code for Citus was already open source before Citus 11—we had unforked Citus from Postgres and open sourced Citus as a Postgres extension way back in 2016 already. So, exactly what new features were open sourced in Citus 11? The full list of newly open-sourced features can be found on our updates page, but I have highlighted below the ones I think you will likely benefit from the most:
.pgpass
file to configure the passwords with which to authenticate between nodes when using open source Citus. With Citus 11 now being fully open source you can use a much easier to use and more powerful alternative: the pg_dist_authinfo
table. In this table you can put the credentials that should be used to authenticate to other nodes. These credentials can be any authentication options that Postgres supports, like passwords or TLS certificates. What makes pg_dist_authinfo
especially easy to use is the fact that you can create a shared single row for each user that is used to authenticate to any of the nodes in your cluster, while with the .pgpass
approach you needed one line for each node and user combination.The arguments and reasoning above might seem obvious now in hindsight, but it took some time to get the stakeholders aligned. But eventually the team agreed, and everyone was very excited about this huge step.
The only decision left to make was when to open source the code: The Citus 11.0 release was the obvious candidate. Then actually open sourcing all the code was surprisingly easy due to the power of git. It was pretty much as simple as the following four commands:
# Create new branch based on open source code
git checkout -b open-source-master-merge-enterprise open-source/master
# Copy all files tracked by git from the enterprise repo
git checkout enterprise/master .
# Keep the open source license instead of the enterprise license
git checkout HEAD -- LICENSE
# Create one big commit
git commit -m 'Make enterprise features open source'
After doing that we still needed to double check the contents of the commit. To make sure that there wasn't some reference to a specific customer in a code comment or something like that. But also, so we could create a comprehensive list of all the features that we were making open source (some of which were even a surprise to us).
Then finally all that was left was creating a pull request with this commit on our open source repo and Citus was completely open source.
By open sourcing all our code in Citus 11 we tried to make your life as a Citus user better, no matter if you use it on-prem, in the cloud on Azure, or on your laptop. So have fun trying out the Citus 11 release. If you don’t know where to start, you should check out our Getting Started page. And if you have any questions, just join our Slack channel or drop your questions on Stack Overflow.