Citus 10.2 is out! 10.2 brings you new columnar & time series features—and is ready to support Postgres 14. Read the new Citus 10.2 blog.

Skip navigation

Citus Blog

Articles tagged: Citus

One of the main reasons people use the Citus extension for Postgres is to distribute the data in Postgres tables across multiple nodes. Citus does this by splitting the original Postgres table into multiple smaller tables and putting these smaller tables on different nodes. The process of splitting bigger tables into smaller ones is called sharding—and these smaller Postgres tables are called “shards”. Citus then allows you to query the shards as if they were still a single Postgres table.

One of the big changes in Citus 10—in addition to adding columnar storage, and the new ability to shard Postgres on a single Citus node—is that we open sourced the shard rebalancer.

Yes, that’s right, we have open sourced the shard rebalancer! The Citus 10 shard rebalancer gives you an easy way to rebalance shards across your cluster and helps you avoid data hotspots over time. Let’s dig into the what and the how.

Keep reading
Jeff Davis

Citus 10 brings columnar compression to Postgres

Written byBy Jeff Davis | March 6, 2021Mar 6, 2021

Citus 10 is out! Check out the Citus 10 blog post for all the details. Citus is an open source extension to Postgres (not a fork) that enables scale-out, but offers other great features, too. See the Citus docs and the Citus github repo and README.

This post will highlight Citus Columnar, one of the big new features in Citus 10. You can also take a look at the columnar documentation. Citus Columnar can be used with or without the scale-out features of Citus.

Postgres typically stores data using the heap access method, which is row-based storage. Row-based tables are good for transactional workloads, but can cause excessive IO for some analytic queries.

Columnar storage is a new way to store data in a Postgres table. Columnar groups data together by column instead of by row; and compresses the data, too. Arranging data by column tends to compress well, and it also means that queries can skip over columns they don’t need. Columnar dramatically reduces the IO needed to answer a typical analytic query—often by 10X!

Keep reading

Development on Citus first started around a decade ago and once a year we release a major new Citus open source version. We wanted to make number 10 something special, but I could not have imagined how truly spectacular this release would become. Citus 10 extends Postgres (12 and 13) with many new superpowers:

  • Columnar storage for Postgres: Compress your PostgreSQL and Citus tables to reduce storage cost and speed up your analytical queries.
  • Sharding on a single Citus node: Make your single-node Postgres server ready to scale out by sharding tables locally using Citus.
  • Shard rebalancer in Citus open source: We have open sourced the shard rebalancer so you can easily add Citus nodes and rebalance your cluster.
  • Joins and foreign keys between local PostgreSQL tables and Citus tables: Mix and match PostgreSQL and Citus tables with foreign keys and joins.
  • Functions to change the way your tables are distributed: Redistribute your tables in a single step using new alter table functions.
  • Much more: Better naming, improved SQL & DDL support, simplified operations.

These new capabilities represent a fundamental shift in what Citus is and what Citus can do for you.

Keep reading
Sai Krishna Srirampur

Faster data migrations in Postgres

Written byBy Sai Srirampur | February 20, 2021Feb 20, 2021

In my day to day, I get to work with many customers migrating their data to Postgres. I work with customers migrating from homogenous sources (PostgreSQL) and also from heterogenous database sources such as Oracle and Redshift. Why do people pick Postgres? Because of the richness of PostgreSQL—and features like stored procedures, JSONB, PostGIS for geospatial workloads, and the many useful Postgres extensions, including my personal favorite: Citus.

A large chunk of the migrations that I help people with are homogenous Postgres-to-Postgres data migrations to the cloud. As Azure Database for PostgreSQL runs open source Postgres, in many cases the application migration can be drop-in and doesn’t require a ton effort. The majority of the effort usually goes into deciding on and implementing the right strategy for performing the data migration.

Keep reading

Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.

If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.

Keep reading
Claire Giordano

When to use Hyperscale (Citus) to scale out Postgres

Written byBy Claire Giordano | December 5, 2020Dec 5, 2020

If you’ve built your application on Postgres, you already know why so many people love Postgres.

And if you’re new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

Keep reading
Claire Giordano

What’s new in the Citus 9.5 extension to Postgres

Written byBy Claire Giordano | November 14, 2020Nov 14, 2020

When I gave the kickoff talk in the Postgres devroom at FOSDEM this year, one of the Q&A questions was: “what’s happening with the Citus open source extension to Postgres?” The answer is, a lot. Since FOSDEM, Marco Slot and I have blogged about how Citus 9.2 speeds up large-scale htap workloads on Postgres, the Citus 9.3 release notes, and what’s new in Citus 9.4.

Now it’s time to walk through everything new in the Citus 9.5 open source release.

Keep reading

GPS has become part of our daily life. GPS is in cars for navigation, in smartphones helping us to find places, and more recently GPS has been helping us to avoid getting infected by COVID-19. Managing and analyzing mobility tracks is the core of my work. My group in Université libre de Bruxelles specializes in mobility data management. We build an open source database system for spatiotemporal trajectories, called MobilityDB. MobilityDB adds support for temporal and spatiotemporal objects to the Postgres database and its spatial extension, PostGIS. If you’re not yet familiar with spatiotemporal trajectories, not to worry, we’ll walk through some movement trajectories for a public transport bus in just a bit.

One of my team’s projects is to develop a distributed version of MobilityDB. This is where we came in touch with the Citus extension to Postgres and the Citus engineering team. This post presents issues and solutions for distributed query processing of movement trajectory data. GPS is the most common source of trajectory data, but the ideas in this post also apply to movement trajectories collected by other location tracking sensors, such as radar systems for aircraft, and AIS systems for sea vessels.

Keep reading

In my work as an engineer on the Postgres team at Microsoft, I get to meet all sorts of customers going through many challenging projects. One recent database migration project I worked on is a story that just needs to be told. The customer—in the retail space—was using Redshift as the data warehouse and Databricks as their ETL engine. Their setup was deployed on AWS and GCP, across different data centers in different regions. And they’d been running into performance bottlenecks and also was incurring unnecessary egress cost.

Specifically, the amount of data in our customer’s analytic store was growing faster than the compute required to process that data. AWS Redshift was not able to offer independent scaling of storage and compute—hence our customer was paying extra cost by being forced to scale up the Redshift nodes to account for growing data volumes. To address these issues, they decided to migrate their analytics landscape to Azure.

Keep reading
Marco Slot

What’s new in the Citus 9.4 extension to Postgres

Written byBy Marco Slot | September 5, 2020Sep 5, 2020

Our latest release to the Citus extension to Postgres is Citus 9.4. If you’re not yet familiar, Citus transforms Postgres into a distributed database, distributing your data and your SQL queries across multiple nodes. This post is basically the Citus 9.4 release notes.

If you’re ready to get started with Citus, it’s easy to download Citus open source packages for 9.4.

I always recommend people check out docs.citusdata.com to learn more. The Citus documentation has rigorous tutorials, details on every Citus feature, explanations of key concepts—things like choosing the distribution column—tutorials on how you can set up Citus locally on a single server, how to install Citus on multiple servers, how to build a real-time analytics dashboard, how to build a multi-tenant database, and more…

Keep reading

Page 2 of 9