Blog posts by Craig Kerstiens on the Citus Blog - Page 2

Fun with SQL: Self joins

Written by By Craig Kerstiens | January 2, 2019 Jan 2, 2019

Various families have various traditions in the US around Christmas time. Some will play games like white elephant where you get a mix of decent gifts as well as gag gifts... you then draw numbers and get to pick from existing presents that have been opened ("stealing" from someone else) or opening an up-opened one. The game is both entertaining to try to get something you want, but also stick Aunt Jennifer with the stuffed poop emoji with a Santa hat on it.

Other traditions are a bit simpler, one that my partner's family follows is drawing names for one person you buy a gift for. This is nice because you can put a bit of effort into that one person without having to be too overwhelmed in tracking down things for multiple people. Each year we draw names for the next year. And by now you're probably thinking what does any of this have to do with SQL? Well normally when we draw names we write them on a piece of paper, someone takes a picture, then that gets texted around to other family members. At least for me every October I'm scrolling back through text messages to try to recall who it was I'm supposed to buy for. This year I took a little time to put everyone's name in a SQL database and write a simple query for easier recall.

Keep reading

How Postgres is more than a relational database: Extensions

Written by By Craig Kerstiens | November 27, 2018 Nov 27, 2018

Postgres has been a great database for decades now, and has really come into its own in the last ten years. Databases more broadly have also gotten their own set of attention as well. First we had NoSQL which started more on document databases and key/value stores, then there was NewSQL which expanding things to distributed, graph databases, and all of these models from documents to distributed to relational were not mutually exclusive. Postgres itself went from simply a relational database (which already had geospatial capabilities) to a multi modal database by adding support for JSONB.

But to me the most exciting part about Postgres isn't how it continues to advance itself, rather it is how Postgres has shifted itself from simply a relational database to more of a data platform. The largest driver for this shift to being a data platform is Postgres extensions. Postgres extensions in simplified terms are lower level APIs that exist within Postgres that allow to change or extend it's functionality. These extension hooks allow Postgres to be adapted for new use cases without requiring upstream changes to the core database. This is a win in two ways:

The Postgres core can continue to move at a very safe and stable pace, ensuring a solid foundation and not risking your data.
Extensions themselves can move quickly to explore new areas without the same review process or release cycle allowing them to be agile in how they evolve.

Okay, plug-ins and frameworks aren't new when it comes to software, what is so great about extensions for Postgres? Well they may not be new to software, but they're not new to Postgres either. Postgres has had extensions as long as I can remember. In Postgres 9.1 we saw a new sytax to make it easy to CREATE EXTENSION and since that time the ecosystem around them has grown. We have a full directory of extensions at PGXN. Older forks such as which were based on older versions are actively working on catching up to a modern release to presumably become a pure extension. By being a pure extension you're able to stay current with Postgres versions without heavy rebasing for each new release. Now the things you can do with extensions is as powerful as ever, so much so that Citus' distributed database support is built on top of this extension framework.

Keep reading

Materialized views vs. Rollup tables in Postgres

Written by By Craig Kerstiens | October 31, 2018 Oct 31, 2018

Materialized views were a long awaited feature within Postgres for a number of years. They finally arrived in Postgres 9.3, though at the time were limited. In Postgres 9.3 when you refreshed materialized views it would hold a lock on the table while they were being refreshed. If your workload was extremely business hours based this could work, but if you were powering something to end-users this was a deal breaker. In Postgres 9.4 we saw Postgres achieve the ability to refresh materialized views concurrently. With this we now have fully baked materialized view support, but even still we've seen they may not always be the right approach.

Keep reading

Commenting your Postgres database

Written by By Craig Kerstiens | October 17, 2018 Oct 17, 2018

At Citus whether it's looking at our own data or helping a customer debug a query I end up writing a lot of SQL. When I do write SQL I do my best to make sure it's readable in case others need to come along and understand or modify, but admittedly I do have some bad habits from time to time such as using implicit joins. Regardless of my bad habits I still try to make my SQL and database as easy to understand for someone not already familiar with it. One of the biggest tools for that is comments.

Even early on in learning to program we take advantage of comments to explain and describe what our code is doing, even in times when it seems obvious. I see this less commonly in SQL and databases, which is a shame because data is just as valuable so making it easier to reason and work with seems logical. Postgres has a few great mechanisms you can start leveraging when it comes to commenting so you can better document things.

Keep reading

Use cases for followers (read replicas) in Postgres

Written by By Craig Kerstiens | September 19, 2018 Sep 19, 2018

Citus extends Postgres to be a horizontally scalable database. By horizontally scalable, we mean the data is spread across multiple machines, and you're able to scale not only storage but also memory and compute—thus providing better performance. Without using something like Citus to transform PostgreSQL into a distributed database, sure you can add read replicas to scale, but you're still maintaining a single copy of your data. When you run into scaling issues with your Postgres database, adding a read replica and offloading some of your traffic to your read replica is a common bandaid to slow down the bleeding, but it is only a matter of time until even that doesn't work any further. Whereas with Citus, scaling out your database is as simple as dragging a slider and rebalancing your data.

Are read replicas still useful with horizontally scalable databases?

But that leaves a question, are read-replicas still useful? Well, sure they are.

Keep reading

12 Factor: Dev/prod parity for your database

Written by By Craig Kerstiens | September 4, 2018 Sep 4, 2018

The twelve-factor app changed the way we build SaaS applications. Explicit dependency management, separating config from code, scaling out your app concurrently—these design principles took us from giant J2EE apps to apps that scale predictably on the web. One of these 12 factors has long stood out as a challenge when it comes to databases: dev/prod parity. Sure, you can run the exact same version of your database, and have a sandbox copy, but testing and staging with production data... that's a different story.

Keep reading

Postgres data types you should consider using

Written by By Craig Kerstiens | August 29, 2018 Aug 29, 2018

Postgres is a rich and powerful database. And the existence of PostgreSQL extension APIs have enabled Postgres to expand its capabilities beyond the boundaries of what you would expect in a traditional relational database. Examples of popular Postgres extensions today include HyperLogLog, which gives you approximate distincts with a small footprint—to rich geospatial support via PostGIS—to Citus which helps you scale out your Postgres database across multiple nodes to improve performance for multi-tenant SaaS applications and real-time analytics dashboards—to the built-in full text search capabilities in PostgreSQL. With all the bells and whistles you can layer into Postgres, sometimes the most basic built-ins get overlooked.

PostgreSQL has nearly 100 different data types, and these data types can come with their own tuned indexing or their own specialized functions. You probably already use the basics such as integers and text, and today we're going to take a survey of less-used but incredibly powerful PostgreSQL data types.

Keep reading

How Citus real-time executor parallelizes Postgres queries

Written by By Craig Kerstiens | August 17, 2018 Aug 17, 2018

Citus has multiple different executors which each behaving differently to support a wide array of use cases. For many the notion distributed SQL seems like it has to be a complicated one, but the principles of it aren't rocket science. Here we're going to look at a few examples of how Citus takes standard SQL and transforms it to operate in a distributed form so it can be parallelized. The result is that you can see speed up of 100x or more in query performance over a single node database.

Keep reading

Fun with SQL: Common Table Expressions for more readable queries

Written by By Craig Kerstiens | August 9, 2018 Aug 9, 2018

This week we're continuing our fun with SQL series. In past posts we've looked at generate_series, window functions, and recursive CTEs. This week we're going to take a step backward and look at standard CTEs (common table expressions) within Postgres.

Admittedly SQL isn't always the most friendly language to read. It's a little more friendly to write, but even still not as naturally readable as something like Python. Despite it's shortcomings there it is the lingua franca when it comes to data, SQL is the language and API that began with relational databases and now even non traditional databases are aiming to immitate it with their own SQL like thing. With CTEs though our SQL, even queries hundreds of lines long, can become readable to someone without detailed knowledge of the application.

CTEs (common table expressions), often referred to as with clauses/queries, are essentially views that are valid during the course of a transaction. They can reference earlier CTEs within that same transaction or query essentially allowing you separate building blocks on which you compose your queries. It is of note that CTEs are an optimization boundary, so in cases they may have worse performance than their alternative non-CTE queries. Even still they're incredible useful for readability and should be considered when constructing large complex queries. Let's dig in with an example.

Keep reading

Introducing Landlord: per tenant stats in Postgres with Citus

Written by By Craig Kerstiens | July 31, 2018 Jul 31, 2018

Postgres keeps getting better and better. In recent years, the Postgres community has added JSONB support, improved performance, and added so many usability enhancements. The result: you can work even more powerfully with your database. Over the past 8 years, my favorite two enhancements have been JSONB and pg_stat_statements. Pg_stat_statements is a built-in extension that allows you to get high level insights into queries that are being run as well as their performance—without having to be an expert and without needing a PhD in databases.

Introducing the new landlord feature in Citus 7.5

With Citus 7.5, we've gone one step beyond the awesomeness of pg_stat_statements and Postgres, with the new landlord feature in Citus—to give you per-tenant stats.

Keep reading