POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Colton Shepard
November 8, 2018
This year, as part of the Citus Data annual team retreat and hackathon, we decided to get ambitious and try to create an easy, 1-click migration to the Citus distributed database. My team was lucky enough to get Dimitri Fontaine, the author of the excellent PostgreSQL pgloader utility, so we decided to start our hackathon project by building on top of pgloader and making pgloader more Citus-aware.
For those of you not familiar with Citus—Citus is an extension to PostgreSQL that transforms Postgres into a distributed database. We make Citus available as open source (star us on GitHub!), as enterprise software you can run anywhere, and as a fully-managed database as a service.
Before the 2018 Citus Data developer hackathon, here’s what the PostgreSQL pgloader utility could do:
The pgloader code itself is well-maintained and fortunately we had a good bit of Lisp knowledge on our team, so we were able to extend the Postgres pgloader utility to add native understanding of distributed and reference tables and data relationships.
Here’s what the current incarnation of this looks like if you’re using our example ad analytics app from our Citus database documentation:
load database
from pgsql:///hackathon
into pgsql://localhost:9700/dim
with include drop, reset no sequences
distribute companies using id
distribute campaigns using company_id
distribute ads using company_id from campaigns
distribute clicks using company_id from ads, campaigns
distribute impressions using company_id from ads, campaigns;
You'll note several cool features here:
pgloader understands that it needs to distribute several tables, so the data gets passed to the workers instead of filling up the coordinator If you have a table that needs to have the distribution key added, it can happen automatically on the fly based off of data in a table you specify You don't have to distribute all of your tables, so you can still do a staged migration to Citus if you prefer.
We were pretty proud of this to begin with, but the competition usually gets pretty fierce at these, so we felt that we had to step up our game a bit.
Recently, we’ve seen a lot of multi-tenant SaaS applications built on Rails that use our activerecord-multitenant gem, so one of the questions that came up in our brainstorming session was whether Rails and activerecord have any quirks that we could take advantage of, to automate more parts of the process.
We came up with a few interesting candidates, but the one that really stood out to us was foreign keys.
For those of you that haven’t looked into this in detail, activerecord will automatically add appropriate foreign keys when you use belongs_to
, has_one
, references
, or many other commonly used associations. This means that, hidden in your database, there’s a whole lot of information we can use to put together a map of the data hierarchy, and, well, that was just too useful to ignore.
Now, all you need to do is pick one database table to distribute and the column to use, and pgloader walks the foreign key dependencies for you, figures out what other tables it can distribute with the same key, adds it if it needs to, and distributes everything. You’re on Citus in one command.
I’m just going to repeat that “in one command” bit here so it’s clear that’s not a typo :)
Here’s all you need for the updated pgloader syntax if you’ve got foreign keys in place:
load database
from pgsql:///hackathon
into pgsql://localhost:9700/dim
with include drop, reset no sequences
distribute companies using id;
All in all, we’re pretty excited with how our hackathon project turned out. It passed the tests we’ve done so far, so we’ve started cleaning up the quick and dirty code from the hackathon and Dimitri already added it to the repo. We’re looking forward to seeing how the Citus-ready pgloader works in a real-world environment, so if you end up trying it out, let us know what you think on our Citus public Slack!