Craig Kerstiens

Craig Kerstiens

CITUS BLOG AUTHOR PROFILE

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.

PUBLISHED ARTICLES
Craig Kerstiens

How Citus works (a look at dynamic executors)

Written by By Craig Kerstiens | September 15, 2017 Sep 15, 2017

In the beginning there was Postgres

We love Postgres at Citus. And rather than create a newfangled database from scratch, we implemented Citus as an extension to Postgres. We've talked a lot on our blog here about you can leverage Citus, about key use cases, and different data models and sharding approaches. But we haven’t spent a lot of time explaining how Citus works. So if you want to dive deeper into how Citus works, here we're going to walk through how Citus shards the data all the way through to how the executors run queries.

Distributing data within Citus

Citus gets its benefits from sharding your data which allows us to split the data across multiple physical nodes. When your tables are significantly smaller due to sharding your indexes are smaller, vacuum runs faster, everything works like it did when your database was smaller and easier to manage.

Keep reading
Craig Kerstiens

Five sharding data models and which is right

Written by By Craig Kerstiens | August 28, 2017 Aug 28, 2017

When it comes to scaling your database, there are challenges but the good news is that you have options. The easiest option of course is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices.

Deleting portions of your data is simple, if you can afford to do it. Regarding sharding there are a number of approaches and which one is right depends on a number of factors. Here we'll review a survey of five sharding approaches and dig into what factors guide you to each approach.

Keep reading
Craig Kerstiens

Introducing WAL-G by Citus: Faster Disaster Recovery for Postgres

Written by By Craig Kerstiens | August 18, 2017 Aug 18, 2017

A key part of running a reliable database service is ensuring you have a good plan for disaster recovery. Disaster recovery comes into play when disks or instances fail, and you need to be able to recover your data. In those type of cases logical backups, via pg_dump, may be days old and in such cases not ideal for you to restore from. To remove the risk of data loss, many of us turn to the Postgres WAL to keep safe.

Years ago Daniel Farina, now a principal engineer at Citus Data, authored a continuous archiving utility to make it easy for Postgres users to prepare for and recover from disasters. The tool, WAL-E, has been used to keep millions of Postgres databases safe. Today we're excited to introduce an exciting new version of this tool: WAL-G. WAL-G, the successor to WAL-E, was created by a software engineering intern here at Citus Data, Katie Li, who is an undergraduate at UC Berkeley.

Keep reading
Craig Kerstiens

Fork your distributed Postgres database with Citus

Written by By Craig Kerstiens | August 4, 2017 Aug 4, 2017

Having a database staging environment that is as close to production as possible is key to being able to test your app. This applies to both your code and to your database. Far too often a staging database is a forgotten child in your stack—not getting the same love and attention as your production instance. For some teams, their staging database is years old, or worse yet, their staging database is a 10 GB sample of a 2 TB production database.

What if you could easily have a full staging environment to experiment with, that is an exact copy of your production database? Even if that production database is 50 TB?

As of today on Citus Cloud—our fully-managed database as a service that is built to scale-out (and based on Postgres!)—you can get a full fork of your production database with the click of a button.

Keep reading
Craig Kerstiens

Database Table Types with Citus and Postgres

Written by By Craig Kerstiens | July 27, 2017 Jul 27, 2017

Citus is Postgres that scales out horizontally. We do this by distributing queries across multiple Postgres servers—and as is often the case with scale-out architectures, this scale-out approach provides some great performance gains. And because Citus is an extension to Postgres, you get all the awesome features in Postgres such as support for JSONB, full-text search, PostGIS, and more.

The distributed nature of the Citus extension gives you new flexibility when it comes to modeling your data. This is good. But you’ll need to think about how to model your data and what type of database tables to use. The way you query your data ultimately determines how you can best model each table. In this post, we'll dive into the three different types of database tables in Citus and how you should think about each.

Keep reading
Craig Kerstiens

Customizing My Postgres Shell

Written by By Craig Kerstiens | July 16, 2017 Jul 16, 2017

As a developer your CLI is your home. You spend a lifetime of person-years in your shell and even small optimizations can pay major dividends to your efficiency. For anyone that works with Postgres and likely the psql editor, you should consider investing some love and care into psql. A little known fact is that psql has a number of options you can configure it with, and these configuration options can all live within an rc file called psqlrc in your home directory. Here is my .psqlrc file, which I've customized to my liking. Let’s walk through some of the commands within my .psqlrc file:

Keep reading
Craig Kerstiens

Introducing Citus Add-on for Heroku—Scale out your Postgres

Written by By Craig Kerstiens | July 13, 2017 Jul 13, 2017

Just as Heroku has made it simple for you to deploy applications, at Citus Data we aim to make it simple for you to scale out your Postgres database.

Once upon a time at Heroku, it all started with git push heroku master. Later, the team at Heroku made it easy to add any service you could want within your app via heroku addons:create foo. The simplicity of dragging a slider to scale up your dynos is the type of awesome customer experience we strive to create at Citus. With Citus Cloud (our fully-managed database as a service), you can simply drag and save—then voila, you've scaled the resources to your database.

Keep reading
Craig Kerstiens

Scaling Connections in Postgres

Written by By Craig Kerstiens | May 10, 2017 May 10, 2017

There are a number of applications out there that have a high number of connections to Postgres. What's high? That all depends on your application, but generally when you get to the few hundred connection area in Postgres you're in the higher end. Anything in the thousands is definitely in the high territory, and even several hundred can put strain on your application. Generally a safe level for connections should be somewhere around 300-500 connections. This may seem low if you're already running with thousands of connections, but it's likely perfectly fine with pgBouncer taking care of the heavy lifting for you. Let's drill into why a bit further.

Keep reading
Craig Kerstiens

Dynamically resizing your Postgres cluster with Citus' shard rebalancer

Written by By Craig Kerstiens | April 11, 2017 Apr 11, 2017

One of the most common questions we get at Citus is how the rebalancer works–which is understandable. When you have an elastic scaled out database, how easy it is to scale is a key factor into how usable it will be. While we'll happily take time for anyone that's interested and live demo it, this walk through should give you a better idea of how it works for those of you that are curious.

Keep reading
Craig Kerstiens

A multi-tenant sharding tutorial

Written by By Craig Kerstiens | March 9, 2017 Mar 9, 2017

A number of SaaS applications have data models where they want to have their customers interact with only their data. At the enterprise end you have companies like Salesforce and Workday that fall into this bucket, but we see a ton of small ones as well. If you're just getting started figuring out how you should approach your data so it can scale in the future, it doesn't have to be hard.

Here we're going to walk through an example data model that you can use as a basis for learning how you could apply the same to your own multi-tenant application.

Keep reading

Page 6 of 8