Citus 10 is out! New features include columnar storage & Citus on a single node—plus we’ve open-sourced the shard rebalancer. Read the Citus 10 blog.

Skip navigation

Citus Blog

Articles tagged: Postgres extensions

Marco Slot

Citus Talk at CMU: Distributed PostgreSQL as an Extension

Written byBy Marco Slot | April 10, 2021Apr 10, 2021

Last month we released Citus 10 and we’ve received an overwhelming amount of positive feedback on the new columnar compression and single node Citus features, as well as the news that we’ve open sourced the shard rebalancer.

The new and exciting Citus 10 features are bringing in lots of new users of Citus open source and the managed Hyperscale (Citus) option in Azure Database for PostgreSQL. And many of you are asking:

Keep reading
Jeff Davis

Citus 10 brings columnar compression to Postgres

Written byBy Jeff Davis | March 6, 2021Mar 6, 2021

Citus 10 is out! Check out the Citus 10 blog post for all the details. Citus is an open source extension to Postgres (not a fork) that enables scale-out, but offers other great features, too. See the Citus docs and the Citus github repo and README.

This post will highlight Citus Columnar, one of the big new features in Citus 10. You can also take a look at the columnar documentation. Citus Columnar can be used with or without the scale-out features of Citus.

Postgres typically stores data using the heap access method, which is row-based storage. Row-based tables are good for transactional workloads, but can cause excessive IO for some analytic queries.

Columnar storage is a new way to store data in a Postgres table. Columnar groups data together by column instead of by row; and compresses the data, too. Arranging data by column tends to compress well, and it also means that queries can skip over columns they don’t need. Columnar dramatically reduces the IO needed to answer a typical analytic query—often by 10X!

Keep reading

Development on Citus first started around a decade ago and once a year we release a major new Citus open source version. We wanted to make number 10 something special, but I could not have imagined how truly spectacular this release would become. Citus 10 extends Postgres (12 and 13) with many new superpowers:

  • Columnar storage for Postgres: Compress your PostgreSQL and Citus tables to reduce storage cost and speed up your analytical queries.
  • Sharding on a single Citus node: Make your single-node Postgres server ready to scale out by sharding tables locally using Citus.
  • Shard rebalancer in Citus open source: We have open sourced the shard rebalancer so you can easily add Citus nodes and rebalance your cluster.
  • Joins and foreign keys between local PostgreSQL tables and Citus tables: Mix and match PostgreSQL and Citus tables with foreign keys and joins.
  • Functions to change the way your tables are distributed: Redistribute your tables in a single step using new alter table functions.
  • Much more: Better naming, improved SQL & DDL support, simplified operations.

These new capabilities represent a fundamental shift in what Citus is and what Citus can do for you.

Keep reading

Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.

If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.

Keep reading
Claire Giordano

When to use Hyperscale (Citus) to scale out Postgres

Written byBy Claire Giordano | December 5, 2020Dec 5, 2020

If you’ve built your application on Postgres, you already know why so many people love Postgres.

And if you’re new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

Keep reading

One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.

Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:

Keep reading

Postgres is an amazing RDBMS implementation. Postgres is open source and it’s one of the most standard-compliant SQL implementations that you will find (if not the most compliant.) Postgres is packed with extensions to the standard, and it makes writing and deploying your applications simple and easy. After all, Postgres has your back and manages all the complexities of concurrent transactions for you.

In this post I am excited to announce that a new version of pg_auto_failover has been released, pg_auto_failover 1.4.

pg_auto_failover is an extension to Postgres built for high availability (HA), that monitors and manages failover for Postgres clusters. Our guiding principles from day one have been simplicity and correctness. Since pg_auto_failover is open source, you can find it on GitHub and it’s easy to try out. Let’s walk through what’s new in pg_auto_failover, and let’s explore the new capabilities you can take advantage of.

Keep reading

When working on the internals of Citus, an open source extension to Postgres that transforms Postgres into a distributed database, we often get to talk with customers that have interesting challenges you won’t find everywhere. Just a few months back, I encountered an analytics workload that was a really good fit for Citus.

But we had one problem: the percentile calculations on their data (over 300 TB of data) could not meet their SLA of 30 seconds.

To make things worse, the query performance was not even close to the target: the percentile calculations were taking about 6 minutes instead of the required 30 second SLA.

Keep reading
Craig Kerstiens

Approximation algorithms for your database

Written byBy Craig Kerstiens | February 28, 2019Feb 28, 2019

In an earlier blog post I wrote about how breaking problems down into a MapReduce style approach can give you much better performance. We’ve seen Citus is orders of magnitude faster than single node databases when we’re able to parallelize the workload across all the cores in a cluster. And while count (*) and avg is easy to break into smaller parts I immediately got the question what about count distinct, or the top from a list, or median?

Exact distinct count is admittedly harder to tackle, in a large distributed setup, because it requires a lot of data shuffling between nodes. Count distinct is indeed supported within Citus, but at times can be slow when dealing with especially larger datasets. Median across any moderate to large size dataset can become completely prohibitive for end users. Fortunately for nearly all of these there are approximation algorithms which provide close enough answers and do so with impressive performance characteristics.

Keep reading
Craig Kerstiens

The most useful Postgres extension: pg_stat_statements

Written byBy Craig Kerstiens | February 8, 2019Feb 8, 2019

Extensions are capable of extending, changing, and advancing the behavior of Postgres. How? By hooking into low level Postgres API hooks. The open source Citus database that scales out Postgres horizontally is itself implemented as a PostgreSQL extension, which allows Citus to stay current with Postgres releases without lagging behind like other Postgres forks. I’ve previously written about the various types of extensions, today though I want to take a deeper look at the most useful Postgres extension: pg_stat_statements.

Keep reading

Page 1 of 2