Citus Data Blog

Thoughts on scaling out PostgreSQL, sharding, multi-tenant apps, real-time analytics, and distributed databases.

Dimitri Fontaine
By Dimitri Fontaine
September 11, 2018

PostgreSQL 11 and Just In Time Compilation of Queries

PostgreSQL 11 is brewing and will be released soon. In the meantime, testing it with your own application is a great way to make sure the community catches all the remaining bugs before the dot-zero release.

One of the big changes in the next PostgreSQL release is the result of Andres Freund’s work on the query executor engine. Andres has been working on this part of the system for a while now, and in the next release we are going to see a new component in the execution engine: a JIT expression compiler!

Continue reading
Craig Kerstiens
By Craig Kerstiens
September 4, 2018

12 Factor: Dev/prod parity for your database

The twelve-factor app changed the way we build SaaS applications. Explicit dependency management, separating config from code, scaling out your app concurrently—these design principles took us from giant J2EE apps to apps that scale predictably on the web. One of these 12 factors has long stood out as a challenge when it comes to databases: dev/prod parity. Sure, you can run the exact same version of your database, and have a sandbox copy, but testing and staging with production data… that’s a different story.

Continue reading
Craig Kerstiens
By Craig Kerstiens
August 29, 2018

Postgres data types you should consider using

Postgres is a rich and powerful database. And the existence of PostgreSQL extension APIs have enabled Postgres to expand its capabilities beyond the boundaries of what you would expect in a traditional relational database. Examples of popular Postgres extensions today include HyperLogLog, which gives you approximate distincts with a small footprint—to rich geospatial support via PostGIS—to Citus which helps you scale out your Postgres database across multiple nodes to improve performance for multi-tenant SaaS applications and real-time analytics dashboards—to the built-in full text search capabilities in PostgreSQL. With all the bells and whistles you can layer into Postgres, sometimes the most basic built-ins get overlooked.

PostgreSQL has nearly 100 different data types, and these data types can come with their own tuned indexing or their own specialized functions. You probably already use the basics such as integers and text, and today we’re going to take a survey of less-used but incredibly powerful PostgreSQL data types.

Continue reading
Craig Kerstiens
By Craig Kerstiens
August 17, 2018

How Citus real-time executor parallelizes Postgres queries

Citus has multiple different excutors which each behaving differently to support a wide array of use cases. For many the notion distributed SQL seems like it has to be a complicated one, but the principles of it aren’t rocket science. Here we’re going to look at a few examples of how Citus takes standard SQL and transforms it to operate in a distributed form so it can be parallelized. The result is that you can see speed up of 100x or more in query performance over a single node database.

Continue reading
Craig Kerstiens
By Craig Kerstiens
August 9, 2018

Fun with SQL: Common Table Expressions for more readable queries

This week we’re continuing our fun with SQL series. In past posts we’ve looked at generate_series, window functions, and recursive CTEs. This week we’re going to take a step backward and look at standard CTEs (common table expressions) within Postgres.

Admittedly SQL isn’t always the most friendly language to read. It’s a little more friendly to write, but even still not as naturally readable as something like Python. Despite it’s shortcomings there it is the lingua franca when it comes to data, SQL is the language and API that began with relational databases and now even non traditional databases are aiming to immitate it with their own SQL like thing. With CTEs though our SQL, even queries hundreds of lines long, can become readable to someone without detailed knowledge of the application.

CTEs (common table expressions), often referred to as with clauses/queries, are essentially views that are valid during the course of a transaction. They can reference earlier CTEs within that same transaction or query essentially allowing you separate building blocks on which you compose your queries. It is of note that CTEs are an optimization boundary, so in cases they may have worse performance than their alternative non-CTE queries. Even still they’re incredible useful for readability and should be considered when constructing large complex queries. Let’s dig in with an example.

Continue reading
Ozgun Erdogan
By Ozgun Erdogan
August 3, 2018

Citus 7.5: The right way to scale SaaS apps

One of the primary challenges with scaling SaaS applications is the database. While you can easily scale your application by adding more servers, scaling your database is a way harder problem. This is particularly true if your application benefits from relational database features, such as transactions, table joins, and database constraints.

At Citus, we make scaling your database easy. Over the past year, we added support for distributed transactions, made Rails and Django integration seamless, and expanded on our SQL support. We also documented approaches to scaling your SaaS database to thousands of customers.

Today, we’re excited to announce the latest release of our distributed database—Citus 7.5. With this release, we’re adding key features that make scaling your SaaS / multi-tenant database easier. If you’re into bulleted lists, these features include the following.

Continue reading
Craig Kerstiens
By Craig Kerstiens
July 31, 2018

Introducing Landlord: per tenant stats in Postgres with Citus

Postgres keeps getting better and better. In recent years, the Postgres community has added JSONB support, improved performance, and added so many usability enhancements. The result: you can work even more powerfully with your database. Over the past 8 years, my favorite two enhancements have been JSONB and pg_stat_statements. Pg_stat_statements is a built-in extension that allows you to get high level insights into queries that are being run as well as their performance—without having to be an expert and without needing a PhD in databases.

Introducing the new landlord feature in Citus 7.5

With Citus 7.5, we’ve gone one step beyond the awesomeness of pg_stat_statements and Postgres, with the new landlord feature in Citus—to give you per-tenant stats.

Continue reading
Claire Giordano
By Claire Giordano
July 29, 2018

All the things coming soon to PostgresOpen SV 2018

In this world of all things digital where so many of us are online so much of the time—what with architecting, coding, QA'ing, blogging, and slacking—it’s kind of refreshing to step away from our devices and talk to other humans face-to-face at an event.

Especially when it’s a conference chock full of PostgreSQL open source people, from users to developers to community leaders.

Especially when it’s right in our own backyard here in San Francisco.

Especially when it’s PostgresOpen SV 2018

Continue reading
Marco Slot
By Marco Slot
July 25, 2018

High performance distributed DML in Citus

One of the many unique abilities of SQL databases is to transform data using advanced SQL queries and joins in a transactional manner. Commands like UPDATE and DELETE are commonly used for manipulating individual rows, but they become truly powerful when you can use subqueries to determine which rows to modify and how to modify them. It allows you to implement batch processing operations in a thread-safe, transactional, scalable manner.

Citus recently added support for UPDATE/DELETE commands with subqueries that span across all the data. Together with the CTE infrastructure that we’ve introduced over the past few releases, this gives you a new set of powerful distributed data transformation commands. As always, we’ve made sure that queries are executed as quickly and efficiently as possible by spreading out the work to where the data is stored.

Let’s look at an example of how you can use UPDATE/DELETE with subqueries.

Continue reading
Craig Kerstiens
By Craig Kerstiens
July 19, 2018

ZFS Private Beta on Citus Cloud

ZFS is a open source file system with the option to store data on disk in a compressed form. Itself ZFS supports a number of compression algorithms, giving you flexibility to optimize both performance and how much you store on disk. Compressing your data on disk offers two pretty straightforward advantages:

  1. Reduce the amount of storage you need—thus reducing costs
  2. When reading from disk, requires less data to be scanned, improving performance

To date, we have run Citus Cloud—our fully-managed database as a service that scales out Postgres horizontally—in production on EXT4. Today, we’re excited to announce a limited beta program of ZFS support for our Citus Cloud database. ZFS makes Citus Cloud even more powerful for certain use cases. If you are interested in access to the beta contact us to get more info, or continue reading to learn more about the use cases where ZFS and Citus and Postgres can help.

Continue reading

Page 2 of 20