Citus Blog

Articles tagged: distributed Postgres

One of the big new things in Citus 10 is that you can now shard Postgres on a single Citus node. So in addition to using the Citus extension to Postgres to scale out Postgres across a distributed cluster, you can now also:

  • Try out Citus on a single node with just a few simple commands
  • Shard Postgres on a single Citus node to be “scale-out-ready”
  • Simplify CI/CD pipelines by testing with single-node Citus

The Citus 10 release is chock full of new capabilities like columnar storage for Postgres, the open sourcing of the shard rebalancer, as well as the feature we are going to explore here: using Citus on a single node. No matter what type of application you run on top of Citus—multi-tenant SaaS apps, customer-facing analytics dashboards, time-series workloads, high-throughput transactional apps—there is something for everyone in Citus 10.

Keep reading

One of the main reasons people use the Citus extension for Postgres is to distribute the data in Postgres tables across multiple nodes. Citus does this by splitting the original Postgres table into multiple smaller tables and putting these smaller tables on different nodes. The process of splitting bigger tables into smaller ones is called sharding—and these smaller Postgres tables are called “shards”. Citus then allows you to query the shards as if they were still a single Postgres table.

One of the big changes in Citus 10—in addition to adding columnar storage, and the new ability to shard Postgres on a single Citus node—is that we open sourced the shard rebalancer.

Yes, that’s right, we have open sourced the shard rebalancer! The Citus 10 shard rebalancer gives you an easy way to rebalance shards across your cluster and helps you avoid data hotspots over time. Let’s dig into the what and the how.

Keep reading

Development on Citus first started around a decade ago and once a year we release a major new Citus open source version. We wanted to make number 10 something special, but I could not have imagined how truly spectacular this release would become. Citus 10 extends Postgres (12 and 13) with many new superpowers:

  • Columnar storage for Postgres: Compress your PostgreSQL and Citus tables to reduce storage cost and speed up your analytical queries.
  • Sharding on a single Citus node: Make your single-node Postgres server ready to scale out by sharding tables locally using Citus.
  • Shard rebalancer in Citus open source: We have open sourced the shard rebalancer so you can easily add Citus nodes and rebalance your cluster.
  • Joins and foreign keys between local PostgreSQL tables and Citus tables: Mix and match PostgreSQL and Citus tables with foreign keys and joins.
  • Functions to change the way your tables are distributed: Redistribute your tables in a single step using new alter table functions.
  • Much more: Better naming, improved SQL & DDL support, simplified operations.

These new capabilities represent a fundamental shift in what Citus is and what Citus can do for you.

Keep reading

Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.

If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.

Keep reading
Marco Slot

Making Postgres stored procedures 9X faster in Citus

Written byBy Marco Slot | November 21, 2020Nov 21, 2020

Stored procedures are widely used in commercial relational databases. You write most of your application logic in PL/SQL and achieve notable performance gains by pushing this logic into the database. As a result, customers who are looking to migrate from other databases to PostgreSQL usually make heavy use of stored procedures.

When migrating from a large database, using the Citus extension to distribute your database can be an attractive option, because you will always have enough hardware capacity to power your workload. The Hyperscale (Citus) option in Azure Database for PostgreSQL makes it easy to get a managed Citus cluster in minutes.

In the past, customers who migrated stored procedures to Citus often reported poor performance because each statement in the procedure involved an extra network round trip between the Citus coordinator node and the worker nodes. We also observed this ourselves when we evaluated Citus performance using the TPC-C-based workload in HammerDB (TPROC-C), which is implemented using stored procedures.

Keep reading

Custom types—called user-defined types in the PostgreSQL docs—are a powerful Postgres capability that, just like Postgres extensions, were envisioned from Day One in the original design of Postgres. Published in 1985, the Design of Postgres paper stated the 2nd design goal as: “provide user extendibility for data types, operators and access methods.”

It’s kind of cool that the creators of Postgres laid the foundation for the powerful Postgres extensions of today (like PostGIS for geospatial use cases, Citus for scaling out Postgres horizontally, pg_partman for time-based partitioning, and so many more Postgres extensions) way back in 1985 when the design of Postgres paper was first published.

Keep reading

In one of our recent releases of the open source Citus extension, we overhauled the way Citus executes distributed SQL queries—with the net effect being some huge improvements in terms of performance, user experience, Postgres compatibility, and resource management. The Citus executor is now able to dynamically adapt to the type of distributed SQL query, ensuring fast response times both for quick index lookups and big analytical queries.

We call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.

Keep reading

Some of you have been asking, “what’s happening with the Citus open source extension to Postgres?” The short answer is: a lot. More and more users have adopted the Citus extension in order to scale out Postgres, to increase performance and enable growth. And you’re probably not surprised to learn that since Microsoft acquired Citus Data last year, our engineering team has grown quite a bit—and we’ve been continuing to evolve and innovate on the Citus open source extension.

Our newest release is Citus 9.2. We’ve updated the installation instructions on our Download page and in our Citus documentation, and now it’s time to take a walk through what’s new.

Keep reading
Craig Kerstiens

Thinking in MapReduce, but with SQL

Written byBy Craig Kerstiens | February 21, 2019Feb 21, 2019

For those considering Citus, if your use case seems like a good fit, we often are willing to spend some time with you to help you get an understanding of the Citus database and what type of performance it can deliver. We commonly do this in a roughly two hour pairing session with one of our engineers. We’ll talk through the schema, load up some data, and run some queries. If we have time at the end it is always fun to load up the same data and queries into single node Postgres and see how we compare. After seeing this for years, I still enjoy seeing performance speed ups of 10 and 20x over a single node database, and in cases as high as 100x.

And the best part is it didn’t take heavy re-architecting of data pipelines. All it takes is just some data modeling, and parallelization with Citus.

Keep reading

Around 10 years ago I joined Amazon Web Services and that’s where I first saw the importance of trade-offs in distributed systems. In university I had already learned about the trade-offs between consistency and availability (the CAP theorem), but in practice the spectrum goes a lot deeper than that. Any design decision may involve trade-offs between latency, concurrency, scalability, durability, maintainability, functionality, operational simplicity, and other aspects of the system—and those trade-offs have meaningful impact on the features and user experience of the application, and even on the effectiveness of the business itself.

Perhaps the most challenging problem in distributed systems, in which the need for trade-offs is most apparent, is building a distributed database. When applications began to require databases that could scale across many servers, database developers began to make extreme trade-offs. In order to achieve scalability over many nodes, distributed key-value stores (NoSQL) put aside the rich feature set offered by the traditional relational database management systems (RDBMS), including SQL, joins, foreign keys, and ACID guarantees. Since everyone wants scalability, it would only be a matter of time before the RDBMS would disappear, right? Actually, relational databases have continued to dominate the database landscape. And here’s why:

Keep reading

Page 2 of 4