Learn Citus your way. This Getting Started page can help, whether you prefer to learn by reading docs, watching demos, or rolling up your sleeves.

Skip navigation

Citus BlogCitus Blog

Thoughts on scaling out PostgreSQL, sharding, multi-tenant apps, real-time analytics, and distributed databases.

Development on Citus first started around a decade ago and once a year we release a major new Citus open source version. We wanted to make number 10 something special, but I could not have imagined how truly spectacular this release would become. Citus 10 extends Postgres (12 and 13) with many new superpowers:

  • Columnar storage for Postgres: Compress your PostgreSQL and Citus tables to reduce storage cost and speed up your analytical queries.
  • Sharding on a single Citus node: Make your single-node Postgres server ready to scale out by sharding tables locally using Citus.
  • Shard rebalancer in Citus open source: We have open sourced the shard rebalancer so you can easily add Citus nodes and rebalance your cluster.
  • Joins and foreign keys between local PostgreSQL tables and Citus tables: Mix and match PostgreSQL and Citus tables with foreign keys and joins.
  • Functions to change the way your tables are distributed: Redistribute your tables in a single step using new alter table functions.
  • Much more: Better naming, improved SQL & DDL support, simplified operations.

These new capabilities represent a fundamental shift in what Citus is and what Citus can do for you.

Keep reading
Sai Krishna Srirampur

Faster data migrations in Postgres

Written byBy Sai Srirampur | February 20, 2021Feb 20, 2021

In my day to day, I get to work with many customers migrating their data to Postgres. I work with customers migrating from homogenous sources (PostgreSQL) and also from heterogenous database sources such as Oracle and Redshift. Why do people pick Postgres? Because of the richness of PostgreSQL—and features like stored procedures, JSONB, PostGIS for geospatial workloads, and the many useful Postgres extensions, including my personal favorite: Citus.

A large chunk of the migrations that I help people with are homogenous Postgres-to-Postgres data migrations to the cloud. As Azure Database for PostgreSQL runs open source Postgres, in many cases the application migration can be drop-in and doesn’t require a ton effort. The majority of the effort usually goes into deciding on and implementing the right strategy for performing the data migration.

Keep reading
Dimitri Fontaine

Reconnecting your application after a Postgres failover

Written byBy Dimitri Fontaine | February 12, 2021Feb 12, 2021

When those of us who work on Postgres High Availability explain how HA in Postgres works, we often focus on the server side of the stack. Having a Postgres service running with the expected data set is all-important and required for HA, of course. That said, the server side of the stack is not the only thing that matters when implementing high availability. Application code has a super important role to play, too.

In this post, you will learn what happens to your application code and connections when a Postgres failover is orchestrated. Your application might be running on Postgres on-prem with HA configured—or in the cloud—or on a managed PostgreSQL service such as Azure Database for PostgreSQL. Now, if you’re running your app on top of a managed service with HA, you probably don’t need to worry about how to implement HA, as HA is managed by the service. But it’s still useful to understand what happens to your application when a Postgres failover occurs.

Keep reading

Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.

If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.

Keep reading

As part of my work on the open source PostgreSQL team at Microsoft, I recently committed a new feature into PostgreSQL 14 to track dependencies on collation versions, with help from co-author Julien Rouhaud and the many others who contributed ideas. It took a long time to build a consensus on how to tackle this thorny problem (work I began at EnterpriseDB and continued at Microsoft), and you can read about some of the details and considerations in the commit message below and the referenced discussion thread. Please note that some details may change by the time PostgreSQL 14 is released.

commit 257836a75585934cc05ed7a80bccf8190d41e056
Author: Thomas Munro <[email protected]>
Date:   Mon Nov 2 19:50:45 2020 +1300

    Track collation versions for indexes.

    Record the current version of dependent collations in pg_depend when
    creating or rebuilding an index.  When accessing the index later, warn
    that the index may be corrupted if the current version doesn't match.

    Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
    Michael Paquier, Robert Haas, Tom Lane and others for very helpful
    discussion.

    Author: Thomas Munro <[email protected]>
    Author: Julien Rouhaud <[email protected]>
    Reviewed-by: Peter Eisentraut <[email protected]> (earlier versions)
    Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com

I’m pretty happy with the result so far, but there is more to be done (see further down)! Now seems like a good time to walk you through the problem we needed to solve—that PostgreSQL indexes can get corrupted by changes in collations that occur naturally over time—and how the new feature makes things better in PostgreSQL 14. Plus, you’ll get a bit of background on collations, too.

Keep reading
Claire Giordano

When to use Hyperscale (Citus) to scale out Postgres

Written byBy Claire Giordano | December 5, 2020Dec 5, 2020

If you’ve built your application on Postgres, you already know why so many people love Postgres.

And if you’re new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

Keep reading
Marco Slot

Making Postgres stored procedures 9X faster in Citus

Written byBy Marco Slot | November 21, 2020Nov 21, 2020

Stored procedures are widely used in commercial relational databases. You write most of your application logic in PL/SQL and achieve notable performance gains by pushing this logic into the database. As a result, customers who are looking to migrate from other databases to PostgreSQL usually make heavy use of stored procedures.

When migrating from a large database, using the Citus extension to distribute your database can be an attractive option, because you will always have enough hardware capacity to power your workload. The Hyperscale (Citus) option in Azure Database for PostgreSQL makes it easy to get a managed Citus cluster in minutes.

In the past, customers who migrated stored procedures to Citus often reported poor performance because each statement in the procedure involved an extra network round trip between the Citus coordinator node and the worker nodes. We also observed this ourselves when we evaluated Citus performance using the TPC-C-based workload in HammerDB (TPROC-C), which is implemented using stored procedures.

Keep reading
Claire Giordano

What’s new in the Citus 9.5 extension to Postgres

Written byBy Claire Giordano | November 14, 2020Nov 14, 2020

When I gave the kickoff talk in the Postgres devroom at FOSDEM this year, one of the Q&A questions was: “what’s happening with the Citus open source extension to Postgres?” The answer is, a lot. Since FOSDEM, Marco Slot and I have blogged about how Citus 9.2 speeds up large-scale htap workloads on Postgres, the Citus 9.3 release notes, and what’s new in Citus 9.4.

Now it’s time to walk through everything new in the Citus 9.5 open source release.

Keep reading

GPS has become part of our daily life. GPS is in cars for navigation, in smartphones helping us to find places, and more recently GPS has been helping us to avoid getting infected by COVID-19. Managing and analyzing mobility tracks is the core of my work. My group in Université libre de Bruxelles specializes in mobility data management. We build an open source database system for spatiotemporal trajectories, called MobilityDB. MobilityDB adds support for temporal and spatiotemporal objects to the Postgres database and its spatial extension, PostGIS. If you’re not yet familiar with spatiotemporal trajectories, not to worry, we’ll walk through some movement trajectories for a public transport bus in just a bit.

One of my team’s projects is to develop a distributed version of MobilityDB. This is where we came in touch with the Citus extension to Postgres and the Citus engineering team. This post presents issues and solutions for distributed query processing of movement trajectory data. GPS is the most common source of trajectory data, but the ideas in this post also apply to movement trajectories collected by other location tracking sensors, such as radar systems for aircraft, and AIS systems for sea vessels.

Keep reading

One of the unique things about Postgres is that it is highly programmable via PL/pgSQL and extensions. Postgres is so programmable that I often think of Postgres as a computing platform rather than just a database (or a distributed computing platform—with Citus). As a computing platform, I always felt that Postgres should be able to take actions in an automated way. That is why I created the open source pg_cron extension back in 2016 to run periodic jobs in Postgres—and why I continue to maintain pg_cron now that I work on the Postgres team at Microsoft.

Using pg_cron, you can schedule Postgres queries to run periodically, according to the familiar cron syntax. Some typical examples:

Keep reading

Page 2 of 26