Citus Blog

Articles tagged: distributed Postgres

Around 10 years ago I joined Amazon Web Services and that’s where I first saw the importance of trade-offs in distributed systems. In university I had already learned about the trade-offs between consistency and availability (the CAP theorem), but in practice the spectrum goes a lot deeper than that. Any design decision may involve trade-offs between latency, concurrency, scalability, durability, maintainability, functionality, operational simplicity, and other aspects of the system—and those trade-offs have meaningful impact on the features and user experience of the application, and even on the effectiveness of the business itself.

Perhaps the most challenging problem in distributed systems, in which the need for trade-offs is most apparent, is building a distributed database. When applications began to require databases that could scale across many servers, database developers began to make extreme trade-offs. In order to achieve scalability over many nodes, distributed key-value stores (NoSQL) put aside the rich feature set offered by the traditional relational database management systems (RDBMS), including SQL, joins, foreign keys, and ACID guarantees. Since everyone wants scalability, it would only be a matter of time before the RDBMS would disappear, right? Actually, relational databases have continued to dominate the database landscape. And here's why:

Keep reading
Dimitri Fontaine

A history lesson on SQL joins (in Postgres)

Written byBy Dimitri Fontaine | September 25, 2018Sep 25, 2018

Our beloved Structured Query Language may be the lingua franca for relational databases—but like many languages, SQL is in a state of constant evolution. The first releases of SQL didn't even have a notation for joins. At the time, SQL only supported inner joins.

Cross Joins and Where Filters

As a result, back in early eighties, the only way to express a join condition between tables would be in the WHERE clause.

Keep reading

Citus has multiple different executors which each behaving differently to support a wide array of use cases. For many the notion distributed SQL seems like it has to be a complicated one, but the principles of it aren't rocket science. Here we're going to look at a few examples of how Citus takes standard SQL and transforms it to operate in a distributed form so it can be parallelized. The result is that you can see speed up of 100x or more in query performance over a single node database.

Keep reading
Marco Slot

High performance distributed DML in Citus

Written byBy Marco Slot | July 25, 2018Jul 25, 2018

One of the many unique abilities of SQL databases is to transform data using advanced SQL queries and joins in a transactional manner. Commands like UPDATE and DELETE are commonly used for manipulating individual rows, but they become truly powerful when you can use subqueries to determine which rows to modify and how to modify them. It allows you to implement batch processing operations in a thread-safe, transactional, scalable manner.

Citus recently added support for UPDATE/DELETE commands with subqueries that span across all the data. Together with the CTE infrastructure that we’ve introduced over the past few releases, this gives you a new set of powerful distributed data transformation commands. As always, we’ve made sure that queries are executed as quickly and efficiently as possible by spreading out the work to where the data is stored.

Let’s look at an example of how you can use UPDATE/DELETE with subqueries.

Keep reading
Ozgun Erdogan

Citus 7.4: Move fast and reduce technical debt

Written byBy Ozgun Erdogan | May 24, 2018May 24, 2018

Today, we’re excited to announce the latest release of our distributed database, Citus 7.4! Citus scales out PostgreSQL through sharding, replication, and query parallelization.

Ever since we open sourced Citus as a Postgres extension, we have been incorporating your feedback into our database. Over the past two years, our release cycles went down from six to four to two months. As a result, we have announced 10 new Citus releases, where each release came with notable new features.

Shorter release cycles and more features came at a cost however. In particular, we added new distributed planner and executor logic to support different use cases for multi-tenant applications and real-time analytics. However, we couldn’t find the time to refactor this new logic. We found ourselves accumulating technical debt. Further, our distributed SQL coverage expanded over the past two years. With each year, we ended spending more and more time on testing each new release.

In Citus 7.4, we focused on reducing technical debt related to these items. At Citus, we track our development velocity with each release. While we fix bugs in every release, we found that a full release focused on addressing technical debt would help to maintain our release velocity. Also, a cleaner codebase leads to a happier and more productive engineering team.

Keep reading
Marco Slot

Distributed Execution of Subqueries and CTEs in Citus

Written byBy Marco Slot | March 9, 2018Mar 9, 2018

The latest release of the Citus database brings a number of exciting improvements for analytical queries across all the data and for real-time analytics applications. Citus already offered full SQL support on distributed tables for single-tenant queries and support for advanced subqueries that can be distributed ("pushed down") to the shards. With Citus 7.2, you can also use CTEs (common table expressions), set operations, and most subqueries thanks to a new technique we call "recursive planning".

Keep reading
Craig Kerstiens

PGConf EU: HyperLogLog, Eclipse, and Distributed Postgres

Written byBy Craig Kerstiens | December 11, 2017Dec 11, 2017

We're big fans of Postgres and enjoy getting around to the various community conferences to give talks on relevant topics as well as learn from others. A few months ago we had a good number of Citus team members over at the largest Postgres conference in Europe. Additionally, three of our Citus team members gave talks at the conference. We thought for those of you that couldn't make the conference you might still enjoy getting a glimpse of some of the content. You can browse the full set of talks that were given and slides for them on the PGConf EU website or flip through the presentations from members of the Citus team below.

Keep reading

So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.

If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:

  • Distributed transaction support
  • Zero-downtime shard rebalancer
  • Window function enhancements
  • Distinct ON/count(distinct) enhancements
  • Additional SQL enhancements
  • Checking for new software updates

Keep reading
Ozgun Erdogan

How Citus Executes Distributed Transactions on Postgres

Written byBy Ozgun Erdogan | November 22, 2017Nov 22, 2017

Distributed transactions are one of the meanest, baddest problems in relational databases. With the release of Citus 7.1, distributed transactions are now available to all our users. In this article, we are going to describe how we built distributed transaction support into Citus by using PostgreSQL modules. But first, let’s give an overview of what a distributed transaction is.

(If this sounds familiar, that’s because we first announced distributed transactions as part of last week’s Citus Cloud 2 announcement. The Citus Cloud announcement centered on other new useful capabilities —such as our warp feature to streamline migrations from single-node Postgres deployments to Citus Cloud — but it seems worthwhile to dedicate an entire post to distributed transactions.)

Keep reading
Craig Kerstiens

How Citus works (a look at dynamic executors)

Written byBy Craig Kerstiens | September 15, 2017Sep 15, 2017

In the beginning there was Postgres

We love Postgres at Citus. And rather than create a newfangled database from scratch, we implemented Citus as an extension to Postgres. We've talked a lot on our blog here about you can leverage Citus, about key use cases, and different data models and sharding approaches. But we haven’t spent a lot of time explaining how Citus works. So if you want to dive deeper into how Citus works, here we're going to walk through how Citus shards the data all the way through to how the executors run queries.

Distributing data within Citus

Citus gets its benefits from sharding your data which allows us to split the data across multiple physical nodes. When your tables are significantly smaller due to sharding your indexes are smaller, vacuum runs faster, everything works like it did when your database was smaller and easier to manage.

Keep reading

Page 4 of 5