Citus Data Blog

Thoughts on scaling out PostgreSQL, big data architectures, distributed systems, and the PostgreSQL community.

Parallel indexing in Citus

Indexes are an essential tool for optimizing database performance and are becoming ever more important with big data. However, as the volume of data increases, index maintenance often becomes a write bottleneck, especially for advanced index types which use a lot of CPU time for every row that gets written. Index creation may also become prohibitively expensive as it may take hours or even days to build a new index on terabytes of data in postgres. As of Citus 6.0, we’ve made creating and maintaining indexes that much faster through parallelization.

Marco Slot Jan 17, 2017

Scale Out Multi-Tenant Apps based on Ruby on Rails

Today we’re happy to announce our new activerecord-multi-tenant Ruby library, which enables easy scale-out of applications that are built on top of Ruby on Rails and follow a multi-tenant data model.

This Ruby library has evolved from our experience working with customers, scaling out their multi-tenant apps, and patching some restrictions that ActiveRecord and Rails currently have when it comes to automatic query building. It is based on the excellent acts_as_tenant library, and extends it for the particular use-case of a distributed multi-tenant database like Citus.

Lukas Fittl Jan 5, 2017

Scaling out relational data models, and SQL, through co-location

Relational databases are the first choice of data store for many applications due to their enormous flexibility and reliability. Historically the one knock against relational databases is that they can only run on a single machine, which creates inherent...

Marco Slot Dec 22, 2016

Lessons learned from Postgres schema sharding

We talk with a number of Postgres users each week that are looking to scale out their database. First, we would never recommend scaling out until you truly have to, it’s always easier to scale your database up rather than out. It’s often not until over 100 GB of data that you need to think about sharding.

When you want to scale out though, you want it to be simple. For scaling a multi-tenant database, there’s three common approaches:

Craig Kerstiens Dec 18, 2016

Citus' Replication Model: Today and Tomorrow

Citus is a distributed database that extends (not forks) PostgreSQL. Citus does this by transparently sharding database tables across the cluster and replicating those shards.

After open sourcing Citus, one question that we frequently heard from users related to how Citus replicated data and automated node failovers. In this blog post, we intend to cover the two replication models available in Citus: statement-based and streaming replication. We also plan to describe how these models evolved over time for different use cases.

Ozgun Erdogan Dec 15, 2016

Real-time event aggregation at scale using Postgres w/ Citus

Citus is commonly used to scale out event data pipelines on top of PostgreSQL. Its ability to transparently shard data and parallelise queries over many machines makes it possible to have real-time responsiveness even with terabytes of data. Users with very high data volumes often store pre-aggregated data to avoid the cost of processing raw data at run-time. With Citus 6.0 this type of workflow became even easier using a new feature that enables pre-aggregation inside the database in a massively parallel fashion using standard SQL. For large datasets, querying pre-computed aggregation tables can be orders of magnitude faster than querying the facts table on demand.

Marco Slot Nov 29, 2016

PGConf SV + Postgres Open = PostgresOpen Silicon Valley 🐘

When we started PGConf Silicon Valley we started it with the goal of helping to grow the Postgres community like many of the other conferences out there with a focus on a large scale west coast event. In our first two years of running the conference...

Craig Kerstiens Nov 17, 2016

Introducing Citus 6.0 - A database designed for multi-tenancy

Citus 6.0 allows you to scale out your transactional relational database with minimal changes to your application, thus reducing complexity over other alternatives while still allowing scale. If you’re building a multi-tenant application and outgrow a single node Postgres, by sharding based on tenant with Citus 6.0 you can linearly add more memory and processing power to your database without a large re-architecting of your application. You can still maintain referential integrity, and to your application it’s still just standard Postgres.

Craig Kerstiens Nov 14, 2016

Video and Slides: Scaling your SaaS Database with Postgres

We recently presented a webcast on when and what to consider when scaling your multi-tenant application. In case you missed it, the recording and slides are below. Within the webcast session, we cover:

Craig Kerstiens Nov 7, 2016

Postgres Autovacuum is Not the Enemy

It’s a common misconception that high volume read-write workloads in PostgreSQL inevitably causes database inefficiency. We’ve heard of cases where users encounter slowdowns doing only a few hundred writes per second and turn to systems like Dynamo...

Joe Nelson Nov 4, 2016

Page 1 of 10

Next page