POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
POSETTE 2024 is a wrap! 💯 Thanks for joining the fun! Missed it? Watch all 42 talks online 🍿
Written by Lukas Fittl
January 5, 2017
Today we’re happy to announce our new activerecord-multi-tenant Ruby library, which enables easy scale-out of applications that are built on top of Ruby on Rails and follow a multi-tenant data model.
This Ruby library has evolved from our experience working with customers, scaling out their multi-tenant apps, and patching some restrictions that ActiveRecord and Rails currently have when it comes to automatic query building. It is based on the excellent acts_as_tenant library, and extends it for the particular use-case of a distributed multi-tenant database like Citus.
You can get started by including gem 'activerecord-multi-tenant'
into your Gemfile, running bundle install
, and then annotating your ActiveRecord models like this:
class PageView < ActiveRecord::Base
multi_tenant :customer
# ...
end
In this case customer
is the tenant model, and your page_views
table needs to have a customer_id
column that references the customer the page view belongs to.
We’ll dig into the details more below, but before we dive too deep, let us zoom out a bit and look at the typical evolution of a SaaS application today and where we aim to help.
Compared to data warehousing, or event analytics, a SaaS application typically follows a multi-tenant model, where you can clearly separate the data for individual tenants (or customers), and co-locate that data on a single database node.
One good example of a typical multi-tenant SaaS application is our ad analytics reference app, which looks like this:
As you can see in the screenshot, most data is associated to the currently logged in customer - even though this is complex analytical data, all data is accessed in the context of a single customer or tenant.
Initially you’ll often start out with all tenants placed on a single database node, and using a framework like Ruby on Rails and ActiveRecord to load the data for a given tenant when you serve a web request that returns the tenant’s data.
ActiveRecord makes a few assumptions about the data storage that limit your scale-out options. In particular, ActiveRecord introduces a pattern where you normalize data and split it into many distinct models each identified by a single id
column, with multiple belongs_to
relationships that tie objects back to a tenant or customer:
class Customer < ActiveRecord::Base
has_many :sites
end
class Site < ActiveRecord::Base
belongs_to :customer
has_many :page_views
end
class PageView < ActiveRecord::Base
belongs_to :site
end
The tricky thing with this pattern is that in order to find all page views for a customer, you'll have to query for all of a customer's sites first. This becomes a problem once you start sharding data, and in particular when you run UPDATE or DELETE queries on nested models like page views in this example.
There are a few steps you can take today, to make scaling out easier in the future:
1. Introduce a column for the tenant_id on every record that belongs to a tenant
In order to scale out a multi-tenant model, its essential you can locate all records that belong to a tenant quickly. The easiest way to achieve this is to simply add a tenant_id
column (or “customer_id” column, etc) on every object that belongs to a tenant, and backfilling your existing data to have this column set correctly.
When you move to a distributed multi-tenant database like Citus in the future, this will be a required step - but if you've done this before, you can simply COPY over your data, without doing any additional data modification.
2. Use UNIQUE constraints which include the tenant_id
Unique constraints on values will present a problem in any distributed system, since it’s difficult to make sure that no two nodes accept the same unique value.
In many cases, you can work around this problem by adding the tenant_id to the constraint, effectively making objects unique inside a given tenant, but not guaranteeing this beyond that tenant.
For example, Rails creates a primary key by default, that only includes the id
of the record:
Indexes:
"page_views_pkey" PRIMARY KEY, btree (id)
You should modify that primary key to also include the tenant_id:
ALTER TABLE page_views DROP CONSTRAINT page_views_pkey;
ALTER TABLE page_views ADD PRIMARY KEY(id, customer_id);
An exception to this rule might be an email or username column on a users table (unless you give each tenant their own login page), which is why, once you scale out, we typically recommend these to be split out from your distributed tables and placed as a local table on the Citus coordinator node.
3. Include the tenant_id in all queries, even when you can locate an object using its own object_id
The easiest way to run a typical SQL query in a distributed system without restrictions is to always access data that lives on a single node, determined by the tenant you are accessing.
For this reason, once you use a distributed system like Citus, we recommend you always specify both the tenant_id and an object’s own ID for queries, so the coordinator can locate your data quickly, and can route the query to a single shard - instead of going to each shard in the system individually and asking the shard whether it knows the given object_id.
The library activerecord-multi-tenant aims to make it easier to implement the above in a typical Rails application.
As mentioned in the beginning, by adding multi_tenant :customer
annotations to your models the library automatically takes care of including the tenant_id with all queries.
In order for that to work, you’ll always need to specify which tenant you are accessing, either by specifying it on a per-request basis:
class ApplicationController < ActionController::Base
# Opt-into the "set_current_tenant" controller helpers by specifying this:
set_current_tenant_through_filter
before_filter :set_customer_as_tenant
def set_customer_as_tenant
customer = Customer.find(session[:current_customer_id])
set_current_tenant(customer) # Set the tenant
end
end
Or by wrapping your code in a block, e.g. for background and maintenance tasks:
customer = Customer.find(session[:current_customer_id])
# ...
MultiTenant.with(customer) do
site = Site.find(params[:site_id])
# Modifications automatically include tenant_id
site.update! last_accessed_at: Time.now
# Queries also include tenant_id automatically
site.page_views.count
end
Once you are ready to use a distributed multi-tenant database like Citus, all you need is a few adjustments to your migrations, and you're good to go:
class InitialTables < ActiveRecord::Migration
def up
create_table :page_views, partition_key: :customer_id do |t|
t.references :customer, null: false
t.references :site, null: false
t.text :url, null: false
...
t.timestamps null: false
end
create_distributed_table :page_views, :account_id
end
def down
drop_table :page_views
end
end
Note the partition_key: :customer_id
, something thats added to Rails' create_table
by our library, which ensures that the primary key includes the tenant_id column, as well as create_distributed_table
which enables Citus to scale out the data to multiple nodes.
If you are interested in a more complete example, feel free to check out our reference app that showcases a simplified sample SaaS application for ad analytics.
You can find activerecord-multi-tenant on Rubygems.org, as well as GitHub. We’d also be happy to hear your feedback and experiences in our Slack channel.
If you would like to give Citus a try, you can download Citus open source here. Update October 2022: You can find the Citus managed database service in Azure Cosmos DB for PostgreSQL. Learn more about distributing Postgres with Citus on Azure.