Citus Blog

Articles tagged: time series

The topic of this month's PGSQL Phriday #011 community blogging event is partitioning vs. sharding in PostgreSQL. It seemed right to share a perspective on the question of "partitioning vs. sharding" from someone in the Citus open source team, since we eat, sleep, and breathe sharding for Postgres.

Postgres built-in "native" partitioning—and sharding via PG extensions like Citus—are both tools to grow your Postgres database, scale your application, and improve your application's performance.

What is partitioning and what is sharding? In Postgres, database partitioning and sharding are techniques for splitting collections of data into smaller sets, so the database only needs to process smaller chunks of data at a time. And as you might imagine, work gets done faster when you're processing less data.

In this post, you'll learn what partitioning and sharding are, why they matter, and when to use them. The table of contents:

Keep reading
Claire Giordano

UK COVID-19 dashboard built using Postgres and Citus for millions of users

Written byBy Claire Giordano & Pouria Hadjibagheri | December 11, 2021Dec 11, 2021

From the beginning of the COVID-19 pandemic, the United Kingdom (UK) government has made it a top priority to track key health metrics and to share those metrics with the public.

And the citizens of the UK were hungry for information, as they tried to make sense of what was happening. Maps, graphs, and tables became the lingua franca of the pandemic. As a result, the GOV.UK Coronavirus dashboard became one of the most visited public service websites in the United Kingdom.

The list of people who rely on the UK Coronavirus dashboard is quite long: government personnel, public health officials, healthcare employees, journalists, and the public all use the same service.

Keep reading
Burak Velioglu

How to scale Postgres for time series data with Citus

Written byBy Burak Velioglu | October 22, 2021Oct 22, 2021

Managing time series data at scale can be a challenge. PostgreSQL offers many powerful data processing features such as indexes, COPY and SQL—but the high data volumes and ever-growing nature of time series data can cause your database to slow down over time.

Fortunately, Postgres has a built-in solution to this problem: Partitioning tables by time range.

Partitioning with the Postgres declarative partitioning feature can help you speed up query and ingest times for your time series workloads. Range partitioning lets you create a table and break it up into smaller partitions, based on ranges (typically time ranges). Query performance improves since each query only has to deal with much smaller chunks. Though, you’ll still be limited by the memory, CPU, and storage resources of your Postgres server.

The good news is you can scale out your partitioned Postgres tables to handle enormous amounts of data by distributing the partitions across a cluster. How? By using the Citus extension to Postgres. In other words, with Citus you can create distributed time-partitioned tables. To save disk space on your nodes, you can also compress your partitions—without giving up indexes on them. Even better: the latest Citus 10.2 open-source release makes it a lot easier to manage your partitions in PostgreSQL.

Keep reading
Onder Kalaci

What’s new in the Citus 10.2 extension to Postgres

Written byBy Onder Kalaci | September 17, 2021Sep 17, 2021

Citus 10.2 is out! If you are not yet familiar with Citus, it is an open source extension to Postgres that transforms Postgres into a distributed database—so you can achieve high performance at any scale. The Citus open source packages are available for download. And Citus is also available in the cloud as a managed service, too.

You can see a bulleted list of all the changes in the CHANGELOG on GitHub. This post is your guide to what’s new in Citus 10.2, including some of these headline features.

Keep reading
Jeff Davis

Citus 10 brings columnar compression to Postgres

Written byBy Jeff Davis | March 6, 2021Mar 6, 2021

Citus 10 is out! Check out the Citus 10 blog post for all the details. Citus is an open source extension to Postgres (not a fork) that enables scale-out, but offers other great features, too. See the Citus docs and the Citus github repo and README.

This post will highlight Citus Columnar, one of the big new features in Citus 10. You can also take a look at the columnar documentation. Citus Columnar can be used with or without the scale-out features of Citus.

Postgres typically stores data using the heap access method, which is row-based storage. Row-based tables are good for transactional workloads, but can cause excessive IO for some analytic queries.

Columnar storage is a new way to store data in a Postgres table. Columnar groups data together by column instead of by row; and compresses the data, too. Arranging data by column tends to compress well, and it also means that queries can skip over columns they don’t need. Columnar dramatically reduces the IO needed to answer a typical analytic query—often by 10X!

Keep reading
Claire Giordano

When to use Hyperscale (Citus) to scale out Postgres

Written byBy Claire Giordano | December 5, 2020Dec 5, 2020

If you've built your application on Postgres, you already know why so many people love Postgres.

And if you're new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

Keep reading

Today, we’re excited to announce our latest release of our distributed database—Citus 7.2. With this release, we’re making Citus more of a drop-in replacement for your single-node Postgres database, so you don’t need to adapt your SQL for a distributed system.

For multi-tenant applications where the single-tenant queries were scoped to a single machine, Citus already provided full SQL support. . The improvements in Citus 7.2 take our support for distributed SQL one big step further. With Citus database version 7.2, we now extend our distributed SQL support to queries that run on data spread across a cluster of machines. This becomes particularly important for real-time analytics workloads, where even the most complex SELECT queries need to be parallelized across machines.

If you’re into bulleted lists, here’s the quick overview of what’s new in Citus database version 7.2 for distributed queries that span across machines. For an overview of other recent Citus features check out these blogs about distributed transactions and Citus 7.1.

Keep reading

Years ago Citus used to have multiple methods for distributing data across many nodes (we actually still support both today), there was both hash-based partitioning and time-based partitioning. Over time we found big benefits in further enhancing the features around hash-based partitioning which enabled us to add richer SQL support, transactions, foreign keys, and more. Thus in recent years, we put less energy into time-based partitioning. But… no one stopped asking us about time partitioning, especially for fast data expiration. All that time we were listening. We just thought it best to align our product with the path of core Postgres as opposed to branching away from it.

Postgres has had some form of time-based partitioning for years. Though for many years it was a bit kludgy and wasn't part of core Postgres. With Postgres 10 came native time partitioning, and because Citus is an open source extension to Postgres that means anyone using Citus gets to take advantage of time-based partitioning as well. You can now create tables that are distributed across nodes by ID and partitioned by time on disk.

We have found a few Postgres extensions that make partitioning much easier to use. The best in class for improving time partitioning is pg_partman and today we'll dig into getting time partitioning set up with your Citus database cluster using pg_partman.

Keep reading
Craig Kerstiens

Five sharding data models and which is right

Written byBy Craig Kerstiens | August 28, 2017Aug 28, 2017

When it comes to scaling your database, there are challenges but the good news is that you have options. The easiest option of course is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices.

Deleting portions of your data is simple, if you can afford to do it. Regarding sharding there are a number of approaches and which one is right depends on a number of factors. Here we'll review a survey of five sharding approaches and dig into what factors guide you to each approach.

Keep reading

Page 1 of 1