Citus Blog

Articles tagged: sharding

Craig Kerstiens

A multi-tenant sharding tutorial

Written byBy Craig Kerstiens | March 9, 2017Mar 9, 2017

A number of SaaS applications have data models where they want to have their customers interact with only their data. At the enterprise end you have companies like Salesforce and Workday that fall into this bucket, but we see a ton of small ones as well. If you're just getting started figuring out how you should approach your data so it can scale in the future, it doesn't have to be hard.

Here we're going to walk through an example data model that you can use as a basis for learning how you could apply the same to your own multi-tenant application.

Keep reading
Craig Kerstiens

Lessons learned from Postgres schema sharding

Written byBy Craig Kerstiens | December 18, 2016Dec 18, 2016

We talk with a number of Postgres users each week that are looking to scale out their database. First, we would never recommend scaling out until you truly have to, it’s always easier to scale your database up rather than out. It’s often not until over 100 GB of data that you need to think about sharding.

When you want to scale out though, you want it to be simple. For scaling a multi-tenant database, there’s three common approaches:

Keep reading
Ozgun Erdogan

Citus' Replication Model: Today and Tomorrow

Written byBy Ozgun Erdogan | December 15, 2016Dec 15, 2016

Citus is a distributed database that extends (not forks) PostgreSQL. Citus does this by transparently sharding database tables across the cluster and replicating those shards.

After open sourcing Citus, one question that we frequently heard from users related to how Citus replicated data and automated node failovers. In this blog post, we intend to cover the two replication models available in Citus: statement-based and streaming replication. We also plan to describe how these models evolved over time for different use cases.

Keep reading
Ozgun Erdogan

Designing your SaaS Database for Scale with Postgres

Written byBy Ozgun Erdogan | October 3, 2016Oct 3, 2016

If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants / customers / accounts and your database tables capture this natural relation.

With smaller amounts of data (10s of GB), it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant database across dozens or hundreds of machines.

After our blog post on sharding a multi-tenant app with Postgres, we received a number of questions on architectural patterns for multi-tenant databases and when to use which. At a high level, developers have three options:

Keep reading
Craig Kerstiens

Sharding a multi-tenant app with Postgres

Written byBy Craig Kerstiens | August 10, 2016Aug 10, 2016

Whether you’re building marketing analytics, a portal for e-commerce sites, or an application to cater to schools, if you’re building an application and your customer is another business then a multi-tenant approach is the norm. The same code runs for all customers, but each customer sees their own private data set, except in some cases of holistic internal reporting.

Early in your application’s life customer data has a simple structure which evolves organically. Typically all information relates to a central customer/user/tenant table. With a smaller amount of data (10’s of GB) it’s easy to scale the application by throwing more hardware at it, but what happens when you’ve had enough success and data that you have no longer fits in memory on a single box, or you need more concurrency? You scale out, often painfully.

Keep reading

If you're looking at Citus its likely you've outgrown a single node database. In most cases your application is no longer performing as you’d like. In cases where your data is still under 100 GB a single Postgres instance will still work well for you, and is a great choice. At levels beyond that Citus can help, but how you model your data has a major impact on how much performance you're able to get out of the system.

Keep reading

Page 3 of 3