Citus Blog

Articles tagged: Postgres indexes

Onder Kalaci

What’s new in the Citus 10.2 extension to Postgres

Written byBy Onder Kalaci | September 17, 2021Sep 17, 2021

Citus 10.2 is out! If you are not yet familiar with Citus, it is an open source extension to Postgres that transforms Postgres into a distributed database—so you can achieve high performance at any scale. The Citus open source packages are available for download. And Citus is also available in the cloud as a managed service, too.

You can see a bulleted list of all the changes in the CHANGELOG on GitHub. This post is your guide to what’s new in Citus 10.2, including some of these headline features.

Keep reading

[UPDATE in Sep 2021]: This blog post was originally written during the PostgreSQL 14 development cycle. The feature discussed is now a candidate for PostgreSQL 15 and the text has been updated to reflect this.

As part of my work on the open source PostgreSQL team at Microsoft, I've been developing a new feature for PostgreSQL to track dependencies on collation versions, with help from co-author Julien Rouhaud and many others who have contributed ideas. It's taken a long time to build a consensus on how to tackle this thorny problem (work I began at EnterpriseDB and continued at Microsoft), and you can read about some of the details and considerations in the commit message below and the referenced discussion thread. We're not quite done with that yet. It was originally planned for PostgreSQL 14, but some unhandled complications arose so this project is back in the workshop.

commit 257836a75585934cc05ed7a80bccf8190d41e056
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Mon Nov 2 19:50:45 2020 +1300

    Track collation versions for indexes.

    Record the current version of dependent collations in pg_depend when
    creating or rebuilding an index.  When accessing the index later, warn
    that the index may be corrupted if the current version doesn't match.

    Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
    Michael Paquier, Robert Haas, Tom Lane and others for very helpful
    discussion.

    Author: Thomas Munro <thomas.munro@gmail.com>
    Author: Julien Rouhaud <rjuju123@gmail.com>
    Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> (earlier versions)
    Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com

In this article I'll talk about the problem we need to solve—that PostgreSQL indexes can get corrupted by changes in collations that occur naturally over time—and how the new feature will make things better in a future version of PostgreSQL. Plus, you’ll get a bit of background on collations, too.

Keep reading

When Django developer and Azure Postgres* engineer Louise Grandjonc confirmed that she could sit down with me for an interview in the days leading up to DjangoCon 2019, I jumped at the chance. Those of you who were in the room for Louise’s talk this week probably understand why. Louise explains technical topics in a way that makes sense—and she often uses unusual (and fun) examples, from crocodiles to owls, from Harry Potter to Taylor Swift.

And since I experience a bit of FOMO whenever I miss a fun developer conference like DjangoCon, I especially wanted to learn more about Louise’s DjangoCon talk: Postgres Index Types and where to find them.

Here’s an edited transcript of my interview with Louise Grandjonc of Microsoft (@louisemeta on Twitter.)

Keep reading
Craig Kerstiens

12 Factor: Dev/prod parity for your database

Written byBy Craig Kerstiens | September 4, 2018Sep 4, 2018

The twelve-factor app changed the way we build SaaS applications. Explicit dependency management, separating config from code, scaling out your app concurrently—these design principles took us from giant J2EE apps to apps that scale predictably on the web. One of these 12 factors has long stood out as a challenge when it comes to databases: dev/prod parity. Sure, you can run the exact same version of your database, and have a sandbox copy, but testing and staging with production data... that's a different story.

Keep reading

Many companies generate large volumes of time series data from events happening in their application. It’s often useful to have a real-time analytics dashboard to spot trends and changes as they happen. You can build a real-time analytics dashboard on Postgres by constructing a simple pipeline:

  1. Load events into a raw data table in batches
  2. Periodically aggregate new events into a rollup table
  3. Select from the rollup table in the dashboard

For large data streams, Citus (an open source extension to Postgres that scales out Postgres horizontally) can scale out each of these steps across all the cores in a cluster of Postgres nodes.

One of the challenges of maintaining a rollup table is tracking which events have already been aggregated—so you can make sure that each event is aggregated exactly once. A common technique to ensure exactly-once aggregation is to run the aggregation for a particular time period after that time period is over. We often recommend aggregating at the end of the time period for its simplicity, but you cannot provide any results before the time period is over and backfilling is complicated.

Keep reading
Craig Kerstiens

A tour of Postgres Index Types

Written byBy Craig Kerstiens | October 17, 2017Oct 17, 2017

At Citus we spend a lot of time working with customers on data modeling, optimizing queries, and adding indexes to make things snappy. My goal is to be as available for our customers as we need to be, in order to make you successful. Part of that is keeping your Citus cluster well tuned and performant which we take care of for you. Another part is helping you with everything you need to know about Postgres and Citus. After all a healthy and performant database means a fast performing app and who wouldn’t want that. Today we’re going to condense some of the information we’ve shared directly with customers about Postgres indexes.

Postgres has a number of index types, and with each new release seems to come with another new index type. Each of these indexes can be useful, but which one to use depends on 1. the data type and then sometimes 2. the underlying data within the table, and 3. the types of lookups performed. In what follows we'll look at a quick survey of the index types available to you in Postgres and when you should leverage each. Before we dig in, here’s a quick glimpse of the indexes we’ll walk you through:

  • B-Tree
  • Generalized Inverted Index (GIN)
  • Generalized Inverted Seach Tree (GiST)
  • Space partitioned GiST (SP-GiST)
  • Block Range Indexes (BRIN)
  • Hash

Now onto the indexing

Keep reading
Craig Kerstiens

Indexing all the things in Postgres

Written byBy Craig Kerstiens | October 11, 2017Oct 11, 2017

Postgres indexes make your application fast. And while one option is to analyze each of your relational database queries with pg_stat_statements to see where you should add indexes... an alternative fix (and a quick one at that) could be to add indexes to each and every database table—and every column—within your database. To make this easy for you, here's a query you can run that will create the CREATE INDEX commands for every table and column in your Postgres database.

Disclaimer: Adding an index to every column of every table is not a best practice for production databases. And it’s certainly not something we recommend at Citus Data, where we take scaling out Postgres and database performance very seriously. But indexing all the things is a fun “what if” thought exercise that is well worth thinking about, to understand the value of indexes in Postgres. What if you indexed all the things?

With that disclaimer out of the way, onto my advice of how to index all the things as a way to understand your database’s performance.

Keep reading
Marco Slot

Postgres Parallel indexing in Citus

Written byBy Marco Slot | January 17, 2017Jan 17, 2017

Indexes are an essential tool for optimizing database performance and are becoming ever more important with big data. However, as the volume of data increases, index maintenance often becomes a write bottleneck, especially for advanced index types which use a lot of CPU time for every row that gets written. Index creation may also become prohibitively expensive as it may take hours or even days to build a new index on terabytes of data in postgres. As of Citus 6.0, we’ve made creating and maintaining indexes that much faster through parallelization.

Keep reading

Product search is a common, yet sometimes challenging use-case for online retailers and marketplaces. It typically involves a combination of full-text search and filtering by attributes which differ for every product category. More complex use-cases may have many sellers that offer the same product, but with a different price and different properties.

Keep reading

Page 1 of 1