Citus Data Blog

Thoughts on scaling out PostgreSQL, big data architectures, distributed systems, and the PostgreSQL community.

Announcing Citus MX: Scaling Postgres to over 500k writes per second

Today we’re excited to announce the private beta of Citus MX. Citus MX builds on the Citus extension for PostgreSQL, which allows you to scale out PostgreSQL tables across many servers. Citus MX gives you the ability to write to or query distributed tables from any node, which allows you to horizontally scale out your write-throughput using PostgreSQL. It also removes the need to interact with a primary node in a Citus cluster.

We’ve performed over 500k durable writes per second (using YCSB) on a 32 node Citus Cloud cluster with our regular PostgreSQL settings. We’ve also exceeded ingest rates of 7 million records per second using batch COPY. Watch the video to see it in action. If you’re curious to learn more, read on or to get access, sign up below.

Marco Slot Sep 22, 2016

Fun with SQL: Computing run rate and month over month growth in Postgres

In any as-a-service business, which bills monthly, a key metric you track is MRR or monthly recurring revenue. It’s good practice to have this on a dashboard and check it on a monthly, weekly, or even daily basis. If you have a simple pricing model that has set monthly plans, say like Netflix this is pretty easy to calculate:

SELECT sum(user_subscriptions.price)
    FROM user_subscriptions
    WHERE users_subscriptions.ended_at IS null;

The above will give you the run rate as of this exact moment in time. It gets a little more complicated to do this in a single query that gives it to you over time.

Craig Kerstiens Sep 12, 2016

pg_cron: Run periodic jobs in PostgreSQL

Running periodic jobs such as vacuuming or removing old data is a common requirement in PostgreSQL. A simple way to achieve this is to configure cron or another external daemon to periodically connect to the database and run a command. However, with...

Marco Slot Sep 9, 2016

Building a Scalable Postgres Metrics Backend using the Citus Extension

From nearly the beginning of the Citus Cloud service, we’ve had an internal formation provisioned and managed by the service itself. Dogfooding in this manner brings all the usual benefits such as gaining operational knowledge, customer empathy, and etc.

However, more interesting than yet another blog post going over the importance of dogfooding is the two different ways we’re using our Citus formation. Setting up a distributed table requires a bit more forethought than a normal Postgres table, because the choice of shard column has a big impact on the types of queries and joins you can do with the data.

We’re going to look at two cases here: a time-series metrics table and an events table.

Will Leinweber Aug 30, 2016

Announcing Citus 5.2

For years we’ve been focused on making Citus the best solution for scaling out your database. We’ve seen customers attain up to 100x performance when compared on the same hardware to vanilla Postgres. Of course you don’t always need to scale out to get good performance–if you have 10 GB of data a single node Postgres can work great. But at data sizes of 100 GB and up, the need to scale out may exist.

Today, with the release of Citus 5.2, it’s now easier to get started earlier so you don’t have to worry about when that moment comes where you won’t be able to scale up further.

Craig Kerstiens Aug 19, 2016

Using State Machines to Power Citus Cloud (our Database as a Service)

It has been several months since the launch of Citus Cloud, and we’d like to share one part of its design with you. In particular, the fundamental unit of organization for our hosted service on top of AWS is concurrent state machines. In what follows...

Daniel Farina Aug 12, 2016

Sharding a multi-tenant app with Postgres

Whether you’re building marketing analytics, a portal for e-commerce sites, or an application to cater to schools, if you’re building an application and your customer is another business then a multi-tenant approach is the norm. The same code runs for all customers, but each customer sees their own private data set, except in some cases of holistic internal reporting.

Early in your application’s life customer data has a simple structure which evolves organically. Typically all information relates to a central customer/user/tenant table. With a smaller amount of data (10’s of GB) it’s easy to scale the application by throwing more hardware at it, but what happens when you’ve had enough success and data that you have no longer fits in memory on a single box, or you need more concurrency? You scale out, often painfully.

Craig Kerstiens Aug 10, 2016

Sharding Postgres with semi-structured data and its performance implications

If you’re looking at Citus its likely you’ve outgrown a single node database. In most cases your application is no longer performing as you’d like. In cases where your data is still under 100 GB a single Postgres instance will still work well for you, and is a great choice. At levels beyond that Citus can help, but how you model your data has a major impact on how much performance you’re able to get out of the system.

Craig Kerstiens Jul 25, 2016

Page 1 of 9

Next page