Citus Data Blog

Thoughts on scaling out PostgreSQL, big data architectures, distributed systems, and the PostgreSQL community.

Real-time event aggregation at scale using Postgres w/ Citus

Citus is commonly used to scale out event data pipelines on top of PostgreSQL. Its ability to transparently shard data and parallelise queries over many machines makes it possible to have real-time responsiveness even with terabytes of data. Users with very high data volumes often store pre-aggregated data to avoid the cost of processing raw data at run-time. With Citus 6.0 this type of workflow became even easier using a new feature that enables pre-aggregation inside the database in a massively parallel fashion using standard SQL. For large datasets, querying pre-computed aggregation tables can be orders of magnitude faster than querying the facts table on demand.

Marco Slot Nov 29, 2016

PGConf SV + Postgres Open = PostgresOpen Silicon Valley 🐘

When we started PGConf Silicon Valley we started it with the goal of helping to grow the Postgres community like many of the other conferences out there with a focus on a large scale west coast event. In our first two years of running the conference...

Craig Kerstiens Nov 17, 2016

Introducing Citus 6.0 - A database designed for multi-tenancy

Citus 6.0 allows you to scale out your transactional relational database with minimal changes to your application, thus reducing complexity over other alternatives while still allowing scale. If you’re building a multi-tenant application and outgrow a single node Postgres, by sharding based on tenant with Citus 6.0 you can linearly add more memory and processing power to your database without a large re-architecting of your application. You can still maintain referential integrity, and to your application it’s still just standard Postgres.

Craig Kerstiens Nov 14, 2016

Video and Slides: Scaling your SaaS Database with Postgres

We recently presented a webcast on when and what to consider when scaling your multi-tenant application. In case you missed it, the recording and slides are below. Within the webcast session, we cover:

Craig Kerstiens Nov 7, 2016

Postgres Autovacuum is Not the Enemy

It’s a common misconception that high volume read-write workloads in PostgreSQL inevitably causes database inefficiency. We’ve heard of cases where users encounter slowdowns doing only a few hundred writes per second and turn to systems like Dynamo...

Joe Nelson Nov 4, 2016

Faster PostgreSQL Counting

Everybody counts, but not always quickly. This article is a close look into how PostgreSQL optimizes counting. If you know the tricks there are ways to count rows orders of magnitude faster than you do already.

The problem is actually underdescribed...

Joe Nelson Oct 12, 2016

How Distributed Outer Joins on PostgreSQL with Citus Work

SQL is a very powerful language for analyzing and reporting against data. At the core of SQL is the idea of joins and how you combine various tables together. One such type of join: outer joins are useful when we need to retain rows, even if it has no match on the other side.

And while the most common type of join, inner join, against tables A and B would bring only the tuples that have a match for both A and B, outer joins give us the ability to bring together from say all of table A even if they don’t have a corresponding match in table B. For example, let’s say you keep customers in one table and purchases in another table. When you want to see all purchases of customers, you may want to see all customers in the result even if they did not do any purchases yet. Then, you need an outer join. Within this post we’ll analyze a bit on what outer joins are, and then how we support them in a distributed fashion on Citus.

Eren Başak Oct 10, 2016

Designing your SaaS Database for Scale with Postgres

If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants/customers/accounts and your database tables capture this natural relation.

With smaller amounts of data (10s of GB), it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant database across dozens or hundreds of machines.

After our blog post on sharding a multi-tenant app with Postgres, we received a number of questions on architectural patterns for multi-tenant databases and when to use which. At a high level, developers have three options:

Ozgun Erdogan Oct 3, 2016

Announcing Citus MX: Scaling Postgres to over 500k writes per second

Today we’re excited to announce the private beta of Citus MX. Citus MX builds on the Citus extension for PostgreSQL, which allows you to scale out PostgreSQL tables across many servers. Citus MX gives you the ability to write to or query distributed tables from any node, which allows you to horizontally scale out your write-throughput using PostgreSQL. It also removes the need to interact with a primary node in a Citus cluster.

We’ve performed over 500k durable writes per second (using YCSB) on a 32 node Citus Cloud cluster with our regular PostgreSQL settings. We’ve also exceeded ingest rates of 7 million records per second using batch COPY. Watch the video to see it in action. If you’re curious to learn more, read on or to get access, sign up below.

Marco Slot Sep 22, 2016

Page 1 of 10

Next page