Blog posts tagged with 'distributed Postgres' on the Citus Blog - Page 4

Thinking in MapReduce, but with SQL

Written byBy Craig Kerstiens | February 21, 2019Feb 21, 2019

For those considering Citus, if your use case seems like a good fit, we often are willing to spend some time with you to help you get an understanding of the Citus database and what type of performance it can deliver. We commonly do this in a roughly two hour pairing session with one of our engineers. We'll talk through the schema, load up some data, and run some queries. If we have time at the end it is always fun to load up the same data and queries into single node Postgres and see how we compare. After seeing this for years, I still enjoy seeing performance speed ups of 10 and 20x over a single node database, and in cases as high as 100x.

And the best part is it didn't take heavy re-architecting of data pipelines. All it takes is just some data modeling, and parallelization with Citus.

Keep reading

Why the RDBMS is the future of distributed databases, ft. Postgres and Citus

Written byBy Marco Slot | November 30, 2018Nov 30, 2018

Around 10 years ago I joined Amazon Web Services and that’s where I first saw the importance of trade-offs in distributed systems. In university I had already learned about the trade-offs between consistency and availability (the CAP theorem), but in practice the spectrum goes a lot deeper than that. Any design decision may involve trade-offs between latency, concurrency, scalability, durability, maintainability, functionality, operational simplicity, and other aspects of the system—and those trade-offs have meaningful impact on the features and user experience of the application, and even on the effectiveness of the business itself.

Perhaps the most challenging problem in distributed systems, in which the need for trade-offs is most apparent, is building a distributed database. When applications began to require databases that could scale across many servers, database developers began to make extreme trade-offs. In order to achieve scalability over many nodes, distributed key-value stores (NoSQL) put aside the rich feature set offered by the traditional relational database management systems (RDBMS), including SQL, joins, foreign keys, and ACID guarantees. Since everyone wants scalability, it would only be a matter of time before the RDBMS would disappear, right? Actually, relational databases have continued to dominate the database landscape. And here's why:

Keep reading

A history lesson on SQL joins (in Postgres)

Written byBy Dimitri Fontaine | September 25, 2018Sep 25, 2018

Our beloved Structured Query Language may be the lingua franca for relational databases—but like many languages, SQL is in a state of constant evolution. The first releases of SQL didn't even have a notation for joins. At the time, SQL only supported inner joins.

Cross Joins and Where Filters

As a result, back in early eighties, the only way to express a join condition between tables would be in the WHERE clause.

Keep reading

How Citus real-time executor parallelizes Postgres queries

Written byBy Craig Kerstiens | August 17, 2018Aug 17, 2018

Citus has multiple different executors which each behaving differently to support a wide array of use cases. For many the notion distributed SQL seems like it has to be a complicated one, but the principles of it aren't rocket science. Here we're going to look at a few examples of how Citus takes standard SQL and transforms it to operate in a distributed form so it can be parallelized. The result is that you can see speed up of 100x or more in query performance over a single node database.

Keep reading

High performance distributed DML in Citus

Written byBy Marco Slot | July 25, 2018Jul 25, 2018

One of the many unique abilities of SQL databases is to transform data using advanced SQL queries and joins in a transactional manner. Commands like UPDATE and DELETE are commonly used for manipulating individual rows, but they become truly powerful when you can use subqueries to determine which rows to modify and how to modify them. It allows you to implement batch processing operations in a thread-safe, transactional, scalable manner.

Citus recently added support for UPDATE/DELETE commands with subqueries that span across all the data. Together with the CTE infrastructure that we’ve introduced over the past few releases, this gives you a new set of powerful distributed data transformation commands. As always, we’ve made sure that queries are executed as quickly and efficiently as possible by spreading out the work to where the data is stored.

Let’s look at an example of how you can use UPDATE/DELETE with subqueries.

Keep reading

Citus 7.4: Move fast and reduce technical debt

Written byBy Ozgun Erdogan | May 24, 2018May 24, 2018

Today, we’re excited to announce the latest release of our distributed database, Citus 7.4! Citus scales out PostgreSQL through sharding, replication, and query parallelization.

Ever since we open sourced Citus as a Postgres extension, we have been incorporating your feedback into our database. Over the past two years, our release cycles went down from six to four to two months. As a result, we have announced 10 new Citus releases, where each release came with notable new features.

Shorter release cycles and more features came at a cost however. In particular, we added new distributed planner and executor logic to support different use cases for multi-tenant applications and real-time analytics. However, we couldn’t find the time to refactor this new logic. We found ourselves accumulating technical debt. Further, our distributed SQL coverage expanded over the past two years. With each year, we ended spending more and more time on testing each new release.

In Citus 7.4, we focused on reducing technical debt related to these items. At Citus, we track our development velocity with each release. While we fix bugs in every release, we found that a full release focused on addressing technical debt would help to maintain our release velocity. Also, a cleaner codebase leads to a happier and more productive engineering team.

Keep reading

Distributed Execution of Subqueries and CTEs in Citus

Written byBy Marco Slot | March 9, 2018Mar 9, 2018

The latest release of the Citus database brings a number of exciting improvements for analytical queries across all the data and for real-time analytics applications. Citus already offered full SQL support on distributed tables for single-tenant queries and support for advanced subqueries that can be distributed ("pushed down") to the shards. With Citus 7.2, you can also use CTEs (common table expressions), set operations, and most subqueries thanks to a new technique we call "recursive planning".

Keep reading

PGConf EU: HyperLogLog, Eclipse, and Distributed Postgres

Written byBy Craig Kerstiens | December 11, 2017Dec 11, 2017

We're big fans of Postgres and enjoy getting around to the various community conferences to give talks on relevant topics as well as learn from others. A few months ago we had a good number of Citus team members over at the largest Postgres conference in Europe. Additionally, three of our Citus team members gave talks at the conference. We thought for those of you that couldn't make the conference you might still enjoy getting a glimpse of some of the content. You can browse the full set of talks that were given and slides for them on the PGConf EU website or flip through the presentations from members of the Citus team below.

Keep reading

Citus 7.1: Window functions, distinct, distributed transactions, more

Written byBy Ozgun Erdogan | December 1, 2017Dec 1, 2017

So about two weeks ago we had a stealth release of Citus 7.1. And while we have already blogged a bit about the recent (and exciting) update to our fully-managed database as a service–Citus Cloud—and about our newly-added support for distributed transactions, it’s time to share all the things about our latest Citus 7.1 release.

If you’re into bulleted lists, here’s the quick overview of what’s in Citus 7.1:

Distributed transaction support
Zero-downtime shard rebalancer
Window function enhancements
Distinct ON/count(distinct) enhancements
Additional SQL enhancements
Checking for new software updates

Keep reading

How Citus Executes Distributed Transactions on Postgres

Written byBy Ozgun Erdogan | November 22, 2017Nov 22, 2017

Distributed transactions are one of the meanest, baddest problems in relational databases. With the release of Citus 7.1, distributed transactions are now available to all our users. In this article, we are going to describe how we built distributed transaction support into Citus by using PostgreSQL modules. But first, let’s give an overview of what a distributed transaction is.

(If this sounds familiar, that’s because we first announced distributed transactions as part of last week’s Citus Cloud 2 announcement. The Citus Cloud announcement centered on other new useful capabilities —such as our warp feature to streamline migrations from single-node Postgres deployments to Citus Cloud — but it seems worthwhile to dedicate an entire post to distributed transactions.)

Keep reading