Citus Data Blog

Thoughts on scaling out PostgreSQL, big data architectures, distributed systems, and the PostgreSQL community.

Craig Kerstiens
By Craig Kerstiens
January 10, 2018

Database sharding explained in plain English

Sharding is one of those database topics that most developers have a distant understanding of, but the details aren’t always perfectly clear unless you’ve implemented sharding yourself. In building the Citus database (our extension to Postgres that shards the underlying database), we’ve followed a lot of the same principles you’d follow if you were manually sharding Postgres yourself. The main difference of course is that with Citus, we’ve done the heavy lifting to shard Postgres and make it easy to adopt, whereas if you were to shard at the application layer then there’s a good bit of of work needed to re-architect your application.

I’ve found myself explaining how sharding works to many people over the past year and realized it would be useful (and maybe even interesting) to break it down in plain English.

Continue reading
Claire Giordano
By Claire Giordano
January 5, 2018

10 Most-Read Citus Data Blog Posts in 2017, ft. Postgres

Top 10 Most Popular Citus Data Blog Posts in 2017 cover image

What Postgres and distributed database topics got the most attention on our Citus Data blog in 2017? Out of the 47 new posts we published last year, it’s pretty clear that many of you were interested in sharding relational databases, whether it be Ozgun’s principles of sharding or Craig’s post on figuring out which sharding data model is right for you. Heck, the five sharding data models post was so popular that it even got re-published recently on HackerNoon.

Continue reading
Craig Kerstiens
By Craig Kerstiens
December 27, 2017

Building real-time analytics dashboards with Postgres & Citus

Citus scales out Postgres for a number of different use cases, both as a system of record and as a system of engagement. One use case we’re seeing implemented a lot these days: using the Citus database to power customer-facing real-time analytics dashboards, even when dealing with billions of events per day. Dashboards and pipelines are easy to handle when you’re at 10 GB of data, as you grow even basic operations like a count of unique users require non-trivial engineering work to get performing well.

Citus is a good fit for these types of event dashboards because of Citus’ ability to ingest large amounts of data, to perform rollups concurrently, to mix both raw unrolled-up data with pre-aggregated data, and finally to support a large number of concurrent users. Adding all these capabilities together, the Citus extension to Postgres works well for end users where a data warehouse may not work nearly as well. We’ve talked some here about various parts of building a real-time customer facing dashboard, but today we thought we’d go one step further and give you a guide for doing it end to end.

Continue reading
Murat Tuncer
By Murat Tuncer
December 22, 2017

Distributed count distinct vs. HyperLogLog in Postgres

Citus 7.1 shipped just a few weeks back and included a number of great new features. In case you missed the details check out Ozgun’s blog or read up on what Citus is on our site. Today though we want to drill further into an important area in Postgres, counting.

Getting a distinct count of some value out of your database is a common question. We’ve talked about how to count more quickly on our blog before, and followed that up with how you can use probabilistic algorithms like HyperLogLog to do counts faster.

Continue reading
Craig Kerstiens
By Craig Kerstiens
December 20, 2017

Citus Cloud Retrospective for 12/14/2017

On Thursday December 14th we experienced a service outage across all of Citus Cloud, our fully managed database as a service. This was our first system wide outage since we launched the service in April of 2016.

We know that Citus Cloud customers trust us with their data, and its availability is critical. In this case we missed the mark and we’re sorry. We’ve worked over the last few days to thoroughly understand what went wrong and steps we can take to limit the likelihood of this problem occurring again. In addition, we found improvements we can make to our architecture and incident response processes to allow us to better respond to similar types of problems in the future.

Continue reading
Craig Kerstiens
By Craig Kerstiens
December 11, 2017

PGConf EU: HyperLogLog, Eclipse, and Distributed Postgres

We’re big fans of Postgres and enjoy getting around to the various community conferences to give talks on relevant topics as well as learn from others. A few months ago we had a good number of Citus team members over at the largest Postgres conference in Europe. Additionally, three of our Citus team members gave talks at the conference. We thought for those of you that couldn’t make the conference you might still enjoy getting a glimpse of some of the content. You can browse the full set of talks that were given and slides for them on the PGConf EU website or flip through the presentations from members of the Citus team below.

Continue reading
Lukas Fittl
By Lukas Fittl
December 8, 2017

Citus warp: Database migrations without the pain

We rolled out a new database migration feature for the Citus fully-managed database as a service—the Warp migration feature—as part of our Citus Cloud 2 announcement. Today I wanted to walk through Citus Cloud’s warp migration feature in more detail. Before we drill in, we should probably take a step back and look at what typically (and sometimes painfully) goes on for a large database migration.

Continue reading
Ozgun Erdogan
By Ozgun Erdogan
December 1, 2017

Citus 7.1: Window functions, distinct, distributed transactions, more

So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.

If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:

  • Distributed transaction support
  • Zero-downtime shard rebalancer
  • Window function enhancements
  • Distinct ON/count(distinct) enhancements
  • Additional SQL enhancements
  • Checking for new software updates
Continue reading
Ozgun Erdogan
By Ozgun Erdogan
November 22, 2017

How Citus Executes Distributed Transactions on Postgres

Distributed transactions are one of the meanest, baddest problems in relational databases. With the release of Citus 7.1, distributed transactions are now available to all our users. In this article, we are going to describe how we built distributed transaction support into Citus by using PostgreSQL modules. But first, let’s give an overview of what a distributed transaction is.

(If this sounds familiar, that’s because we first announced distributed transactions as part of last week’s Citus Cloud 2 announcement. The Citus Cloud announcement centered on other new useful capabilities —such as our warp feature to streamline migrations from single-node Postgres deployments to Citus Cloud — but it seems worthwhile to dedicate an entire post to distributed transactions.)

Continue reading

Page 1 of 15