Thomas Munro

Thomas Munro

CITUS BLOG AUTHOR PROFILE

Open source software developer focusing on PostgreSQL at Microsoft. Committer on the PostgreSQL and FreeBSD projects. Interested in open hardware, software, protocols, and standards. Speaker at conferences including PGCon, PGDU, FOSDEM & more.

@MengTangmu

PUBLISHED ARTICLES
Thomas Munro

Don’t let collation versions corrupt your PostgreSQL indexes

Written by By Thomas Munro | December 12, 2020 Dec 12, 2020

[UPDATE in Sep 2021]: This blog post was originally written during the PostgreSQL 14 development cycle. The feature discussed is now a candidate for PostgreSQL 15 and the text has been updated to reflect this.

As part of my work on the open source PostgreSQL team at Microsoft, I’ve been developing a new feature for PostgreSQL to track dependencies on collation versions, with help from co-author Julien Rouhaud and many others who have contributed ideas. It’s taken a long time to build a consensus on how to tackle this thorny problem (work I began at EnterpriseDB and continued at Microsoft), and you can read about some of the details and considerations in the commit message below and the referenced discussion thread. We’re not quite done with that yet. It was originally planned for PostgreSQL 14, but some unhandled complications arose so this project is back in the workshop.

commit 257836a75585934cc05ed7a80bccf8190d41e056
Author: Thomas Munro <[email protected]>
Date:   Mon Nov 2 19:50:45 2020 +1300

    Track collation versions for indexes.

    Record the current version of dependent collations in pg_depend when
    creating or rebuilding an index.  When accessing the index later, warn
    that the index may be corrupted if the current version doesn't match.

    Thanks to Douglas Doole, Peter Eisentraut, Christoph Berg, Laurenz Albe,
    Michael Paquier, Robert Haas, Tom Lane and others for very helpful
    discussion.

    Author: Thomas Munro <[email protected]>
    Author: Julien Rouhaud <[email protected]>
    Reviewed-by: Peter Eisentraut <[email protected]> (earlier versions)
    Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com

In this article I’ll talk about the problem we need to solve—that PostgreSQL indexes can get corrupted by changes in collations that occur naturally over time—and how the new feature will make things better in a future version of PostgreSQL. Plus, you’ll get a bit of background on collations, too.

Keep reading

Page 1 of 1