Craig Kerstiens

Craig Kerstiens

CITUS BLOG AUTHOR PROFILE

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.

PUBLISHED ARTICLES
Craig Kerstiens

Database sharding explained in plain English

Written by By Craig Kerstiens | January 10, 2018 Jan 10, 2018

Sharding is one of those database topics that most developers have a distant understanding of, but the details aren't always perfectly clear unless you've implemented sharding yourself. In building the Citus database (our extension to Postgres that shards the underlying database), we've followed a lot of the same principles you'd follow if you were manually sharding Postgres yourself. The main difference of course is that with Citus, we’ve done the heavy lifting to shard Postgres and make it easy to adopt, whereas if you were to shard at the application layer then there’s a good bit of of work needed to re-architect your application.

I've found myself explaining how sharding works to many people over the past year and realized it would be useful (and maybe even interesting) to break it down in plain English.

Keep reading
Craig Kerstiens

Building real-time analytics dashboards with Postgres & Citus

Written by By Craig Kerstiens | December 27, 2017 Dec 27, 2017

Citus scales out Postgres for a number of different use cases, both as a system of record and as a system of engagement. One use case we're seeing implemented a lot these days: using the Citus database to power customer-facing real-time analytics dashboards, even when dealing with billions of events per day. Dashboards and pipelines are easy to handle when you’re at 10 GB of data, as you grow even basic operations like a count of unique users require non-trivial engineering work to get performing well.

Citus is a good fit for these types of event dashboards because of Citus’ ability to ingest large amounts of data, to perform rollups concurrently, to mix both raw unrolled-up data with pre-aggregated data, and finally to support a large number of concurrent users. Adding all these capabilities together, the Citus extension to Postgres works well for end users where a data warehouse may not work nearly as well. We've talked some here about various parts of building a real-time customer facing dashboard, but today we thought we'd go one step further and give you a guide for doing it end to end.

Keep reading
Craig Kerstiens

Citus Cloud Retrospective for 12/14/2017

Written by By Craig Kerstiens | December 20, 2017 Dec 20, 2017

On Thursday December 14th we experienced a service outage across all of Citus Cloud, our fully managed database as a service. This was our first system wide outage since we launched the service in April of 2016.

We know that Citus Cloud customers trust us with their data, and its availability is critical. In this case we missed the mark and we’re sorry. We've worked over the last few days to thoroughly understand what went wrong and steps we can take to limit the likelihood of this problem occurring again. In addition, we found improvements we can make to our architecture and incident response processes to allow us to better respond to similar types of problems in the future.

Keep reading
Craig Kerstiens

PGConf EU: HyperLogLog, Eclipse, and Distributed Postgres

Written by By Craig Kerstiens | December 11, 2017 Dec 11, 2017

We're big fans of Postgres and enjoy getting around to the various community conferences to give talks on relevant topics as well as learn from others. A few months ago we had a good number of Citus team members over at the largest Postgres conference in Europe. Additionally, three of our Citus team members gave talks at the conference. We thought for those of you that couldn't make the conference you might still enjoy getting a glimpse of some of the content. You can browse the full set of talks that were given and slides for them on the PGConf EU website or flip through the presentations from members of the Citus team below.

Keep reading
Craig Kerstiens

Citus Cloud 2, Postgres, and scaling out without sacrifice

Written by By Craig Kerstiens | November 16, 2017 Nov 16, 2017

Relational databases have been a mainstay in applications for decades now. And with their dominance has come a rich set of tools: you have tools to help with monitoring, to gain insight into performance, and to operate the database in safe ways.

The knock against relational databases has long been: what happens when you need to scale? At that point, you usually have to make difficult trade-offs, like having to trade off relational semantics in order to get a database that scales out (think: NoSQL.) Or having to find a way to reduce the amount of data you need to retain, in order to continue to skate by with a single-node relational database. Or having to trade off as much as a year’s worth of application features in order to divert an engineering team away from your core business, to instead spend their time sharding the application. Bottom line: the database trade-offs to get scale can be painful.

Today I’m excited to announce Citus Cloud 2, the newest version of our cloud database. We launched the first release of Citus Cloud 18 months ago as a fully-managed database as a service that enables you to keep right on scaling your relational database. If you’re unfamiliar, Citus is an extension to Postgres that transforms your Postgres database into a distributed system under the covers, while appearing to your application as a single-node database. With Citus, you don’t need to teach your application all about distributed systems to continue scaling. We make Citus available as open source, as on-prem enterprise software, and as a fully-managed database as a service, Citus Cloud. And with Citus Cloud, you have all of your management/backups/etc. taken care of for you.

Keep reading
Craig Kerstiens

Faster bulk loading in Postgres with copy

Written by By Craig Kerstiens | November 8, 2017 Nov 8, 2017

If you've used a relational database, you understand basic INSERT statements. Even if you come from a NoSQL background, you likely grok inserts. Within the Postgres world, there is a utility that is useful for fast bulk ingestion: \copy. Postgres \copy is a mechanism for you to bulk load data in or out of Postgres.

First, lets pause. Do you even need to bulk load data and what's it have to do with Citus? We see customers leverage Citus for a number of different uses. When looking at Citus for a transactional workload, say as the system of record for some kind of multi-tenant SaaS app, your app is mostly performing standard insert/updates/deletes.

But when you're leveraging Citus for real-time analytics, you may already have a separate ingest pipeline. In this case you might be looking at event data, which can be higher in volume than a standard web app workload. If you already have an ingest pipeline that reads off Apache Kafka or Kinesis, you could be a great candidate for bulk ingest.

Back to our feature presentation: Postgres \copy. Copy is interesting because you can achieve much higher throughput than with single row inserts.

Keep reading
Craig Kerstiens

Monitoring your bloat in Postgres

Written by By Craig Kerstiens | October 20, 2017 Oct 20, 2017

Postgres under the covers in simplified terms is one giant append only log. When you insert a new record that gets appended, but the same happens for deletes and updates. For a delete a record is just flagged as invisible, it's not actually removed from disk immediately. For updates the old record is flagged as invisible and a new record is written. What then later happens is Postgres comes through and looks at all records that are invisible and actually frees up the disk storage. This process is known as vacuum.

There are a couple of key levels to VACUUM within Postgres:

  • VACUUM ANALYZE - This one is commonly run when you've recently loaded data into your database and want Postgres to update it's statistics about the data
  • VACUUM FULL - This will take a lock during the operation, but will scan the full table and reclaim all the space it can from dead tuples.
Keep reading
Craig Kerstiens

A tour of Postgres Index Types

Written by By Craig Kerstiens | October 17, 2017 Oct 17, 2017

At Citus we spend a lot of time working with customers on data modeling, optimizing queries, and adding indexes to make things snappy. My goal is to be as available for our customers as we need to be, in order to make you successful. Part of that is keeping your Citus cluster well tuned and performant which we take care of for you. Another part is helping you with everything you need to know about Postgres and Citus. After all a healthy and performant database means a fast performing app and who wouldn’t want that. Today we’re going to condense some of the information we’ve shared directly with customers about Postgres indexes.

Postgres has a number of index types, and with each new release seems to come with another new index type. Each of these indexes can be useful, but which one to use depends on 1. the data type and then sometimes 2. the underlying data within the table, and 3. the types of lookups performed. In what follows we'll look at a quick survey of the index types available to you in Postgres and when you should leverage each. Before we dig in, here’s a quick glimpse of the indexes we’ll walk you through:

  • B-Tree
  • Generalized Inverted Index (GIN)
  • Generalized Inverted Seach Tree (GiST)
  • Space partitioned GiST (SP-GiST)
  • Block Range Indexes (BRIN)
  • Hash

Now onto the indexing

Keep reading
Craig Kerstiens

Indexing all the things in Postgres

Written by By Craig Kerstiens | October 11, 2017 Oct 11, 2017

Postgres indexes make your application fast. And while one option is to analyze each of your relational database queries with pg_stat_statements to see where you should add indexes... an alternative fix (and a quick one at that) could be to add indexes to each and every database table—and every column—within your database. To make this easy for you, here's a query you can run that will create the CREATE INDEX commands for every table and column in your Postgres database.

Disclaimer: Adding an index to every column of every table is not a best practice for production databases. And it’s certainly not something we recommend at Citus Data, where we take scaling out Postgres and database performance very seriously. But indexing all the things is a fun “what if” thought exercise that is well worth thinking about, to understand the value of indexes in Postgres. What if you indexed all the things?

With that disclaimer out of the way, onto my advice of how to index all the things as a way to understand your database’s performance.

Keep reading
Craig Kerstiens

Migrating from single-node Postgres to Citus

Written by By Craig Kerstiens | September 20, 2017 Sep 20, 2017

There are a lot of things that are everyday occurrences for engineering teams. Deploying new code, deploying a new service, it's even fairly common to deploy a net new data store or language. But migrating from one database to another is far more rare. While migrating your database can seem like a daunting task, there are lessons you can learn from others—and steps you can take to minimize risk in migrating from one database to another.

At Citus Data, we've helped many a customer migrate from single node Postgres, like RDS or Heroku Postgres, to a distributed Citus database cluster, so they can scale out and take advantage of the compute, memory, and disk resources of a distributed, scale-out solution. So we've been privy to some valuable lessons learned, and we’ve developed some best practices. Here you can find your guide for steps to follow as you start to create your migration plan to Citus.

Keep reading

Page 5 of 8