Citus Data Blog

Thoughts on scaling out PostgreSQL, big data architectures, distributed systems, and the PostgreSQL community.

Faster PostgreSQL Counting

Everybody counts, but not always quickly. This article is a close look into how PostgreSQL optimizes counting. If you know the tricks there are ways to count rows orders of magnitude faster than you do already.

The problem is actually underdescribed...

Joe Nelson Oct 12, 2016

How Distributed Outer Joins on PostgreSQL with Citus Work

SQL is a very powerful language for analyzing and reporting against data. At the core of SQL is the idea of joins and how you combine various tables together. One such type of join: outer joins are useful when we need to retain rows, even if it has no match on the other side.

And while the most common type of join, inner join, against tables A and B would bring only the tuples that have a match for both A and B, outer joins give us the ability to bring together from say all of table A even if they don’t have a corresponding match in table B. For example, if you keep users in one table and purchases in another table, when you want to see the all purchases of customers, an outer join is needed since you may want to see all users in the result even if they did not do any purchase yet. Within this post we’ll analyze a bit on what outer joins are, and then how we support them in a distributed fashion on Citus.

Eren Oct 10, 2016

Designing your SaaS Database for Scale with Postgres

If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants/customers/accounts and your database tables capture this natural relation.

With smaller amounts of data (10s of GB), it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant database across dozens or hundreds of machines.

After our blog post on sharding a multi-tenant app with Postgres, we received a number of questions on architectural patterns for multi-tenant databases and when to use which. At a high level, developers have three options:

Ozgun Erdogan Oct 3, 2016

Announcing Citus MX: Scaling Postgres to over 500k writes per second

Today we’re excited to announce the private beta of Citus MX. Citus MX builds on the Citus extension for PostgreSQL, which allows you to scale out PostgreSQL tables across many servers. Citus MX gives you the ability to write to or query distributed tables from any node, which allows you to horizontally scale out your write-throughput using PostgreSQL. It also removes the need to interact with a primary node in a Citus cluster.

We’ve performed over 500k durable writes per second (using YCSB) on a 32 node Citus Cloud cluster with our regular PostgreSQL settings. We’ve also exceeded ingest rates of 7 million records per second using batch COPY. Watch the video to see it in action. If you’re curious to learn more, read on or to get access, sign up below.

Marco Slot Sep 22, 2016

Fun with SQL: Computing run rate and month over month growth in Postgres

In any as-a-service business, which bills monthly, a key metric you track is MRR or monthly recurring revenue. It’s good practice to have this on a dashboard and check it on a monthly, weekly, or even daily basis. If you have a simple pricing model that has set monthly plans, say like Netflix this is pretty easy to calculate:

SELECT sum(user_subscriptions.price)
    FROM user_subscriptions
    WHERE users_subscriptions.ended_at IS null;

The above will give you the run rate as of this exact moment in time. It gets a little more complicated to do this in a single query that gives it to you over time.

Craig Kerstiens Sep 12, 2016

pg_cron: Run periodic jobs in PostgreSQL

Running periodic jobs such as vacuuming or removing old data is a common requirement in PostgreSQL. A simple way to achieve this is to configure cron or another external daemon to periodically connect to the database and run a command. However, with...

Marco Slot Sep 9, 2016

Building a Scalable Postgres Metrics Backend using the Citus Extension

From nearly the beginning of the Citus Cloud service, we’ve had an internal formation provisioned and managed by the service itself. Dogfooding in this manner brings all the usual benefits such as gaining operational knowledge, customer empathy, and etc.

However, more interesting than yet another blog post going over the importance of dogfooding is the two different ways we’re using our Citus formation. Setting up a distributed table requires a bit more forethought than a normal Postgres table, because the choice of shard column has a big impact on the types of queries and joins you can do with the data.

We’re going to look at two cases here: a time-series metrics table and an events table.

Will Leinweber Aug 30, 2016

Page 1 of 9

Next page