Citus Data Blog

Distribute PostgreSQL 17 with Citus 13

Naisila Puka — 2025-02-06T18:45:00+00:00

The Citus 13.0 release is out and includes PostgreSQL 17.2 support! We know you’ve been waiting, and we’ve been hard at work adding features we believe will take your experience to the next level, focusing on bringing the Postgres 17 exciting improvements to you at distributed scale.

The Citus database is an open-source extension of Postgres that brings the power of Postgres to any scale, from a single node to a distributed database cluster. Since Citus is an extension, using Citus means you're also using Postgres, giving you direct access to the Postgres features. And the latest of such features came with Postgres 17 release! In addition, Citus 13 will be made available on the elastic clusters (preview) feature on Azure Database for PostgreSQL - Flexible Server, along with PostgreSQL 17 support, in the near future.

PostgreSQL 17 highlights include performance improvements in query execution for indexes, a revamped memory management system for vacuum, new monitoring and analysis features, expanded functionality for managing data in partitions, optimizer improvements, and enhancements for high-concurrency workloads. PostgreSQL 17 also expands on SQL syntax that benefits both new workloads and mission-critical systems, such as the addition of the SQL/JSON JSON_TABLE() command for developers, and the expansion of the MERGE command. For those of you who are interested in upgrading to Postgres 17 and scaling these new features of Postgres: you can upgrade to Citus 13.0!

Along with Postgres 17 support, Citus 13.0 also fixes important bugs, and we are happy to say that we had many community contributions here as well. These bugfixes focus on data integrity and correctness, crash and fault tolerance, and cluster management, all of which are critical for ensuring reliable operations and user confidence in a distributed PostgreSQL environment.

Let's take a closer look at what's new in Citus 13.0:

Postgres 17 support in Citus 13.0
Leveraging optimizer improvements with Citus in Postgres 17
Important bugfixes, including community contributions into Citus 13.0

Postgres 17 support in Citus 13.0

Citus 13.0 introduces support for PostgreSQL 17. This means that just by enabling PG17.2 in Citus 13.0, all the query performance improvements directly reflect on the Citus distributed queries, and several optimizer improvements benefit queries in Citus out of the box! Among the many new features in PG 17, the following capabilities enabled in Citus 13.0 are especially noteworthy for Citus users:

JSON_TABLE() support in distributed queries
Propagate "MERGE ... WHEN NOT MATCHED BY SOURCE" syntax
Expanded functionality on distributed partitioned tables
Propagate new EXPLAIN options: MEMORY and SERIALIZE

To learn more about how you can use Citus 13.0 + PostgreSQL 17.2, as well as currently unsupported features and future work, you can consult the Citus 13.0 Updates page, which gives you detailed release notes.

JSON_TABLE() support in distributed queries

One of the most discussed features in Postgres 17 is the enhanced management of JSON data. The key addition here is the JSON_TABLE() function. JSON_TABLE() converts JSON data into a standard PostgreSQL table. This means that developers can leverage the full capabilities of SQL on data that was originally in JSON format, by getting data from complicated JSONs into a simpler relational view. The commit that adds basic JSON_TABLE() functionality explains that through JSON_TABLE() function, JSON data can be used like other tabular data, with capabilities such as sorting, filtering, joining with regular Postgres tables and usage in a FROM clause. Joining is specifically interesting for Citus: how can we perform a join between the JSON_TABLE() function and a Citus distributed table?

As you may know, Citus uses the sharding technique to distribute the tables, where each table is divided into smaller chunks called shards. Citus 13.0 supports JSON_TABLE() in a distributed query by treating it as a recurring tuple, i.e. an expression that gives the same set of results across multiple shards. This is useful for executing a JOIN between a distributed table and a table coming from the JSON_TABLE() function. The main point is that recurring tuples "recur" for each shard in a multi-shard query. For more technical details on this, check out Citus Technical Documentation on recurring tuples.

Let’s look at example distributed queries of how you can combine the JSON_TABLE() function with Citus distributed tables:

-- create, distribute and populate the table with a jsonb column
-- we will use that column as a regular postgres table through JSON_TABLE()
CREATE TABLE my_favorite_books(book_collection_id bigserial, jsonb_column jsonb);
SELECT create_distributed_table(' my_favorite_books ', 'book_collection_id');

-- these books will be inserted with the automatic book_collection_id of 1
INSERT INTO my_favorite_books (jsonb_column) VALUES (
'{ "favorites" : [
   { "kind" : "mystery", "books" : [
     { "title" : "The Count of Monte Cristo", "author" : "Alexandre Dumas"},
     { "title" : "Crime and Punishment", "author" : "Fyodor Dostoevsky" } ] },
   { "kind" : "drama", "books" : [
     { "title" : "Anna Karenina", "author" : "Leo Tolstoy" } ] },
   { "kind" : "poetry", "books" : [
     { "title" : "Masnavi", "author" : "Jalal al-Din Muhammad Rumi" } ] },
   { "kind" : "autobiography", "books" : [
     { "title" : "The Autobiography of Malcolm X", "author" : "Alex Haley" } ] }
  ] }');

-- these books will be inserted with the automatic book_collection_id of 2
INSERT INTO my_favorite_books (jsonb_column) VALUES (
'{ "favorites" : [
   { "kind" : "mystery", "books" : [
     { "title" : "To Kill a Mockingbird", "author" : "Harper Lee"},
     { "title" : "Our Mutual Friend", "author" : "Charles Dickens" } ] },
   { "kind" : "drama", "books" : [
     { "title" : "Pride and Prejudice", "author" : "Jane Austen" } ] },
   { "kind" : "poetry", "books" : [
     { "title" : "The Odyssey", "author" : "Homer" } ] },
   { "kind" : "autobiography", "books" : [
     { "title" : "The Diary of a Young Girl", "author" : "Anne Frank" } ] }
  ] }');

-- a simple router query, that outputs all the books under book_collection_id = 1
SELECT json_table_output.* FROM
my_favorite_books,
JSON_TABLE ( jsonb_column, '$.favorites[*]' COLUMNS (
   key FOR ORDINALITY, kind text PATH '$.kind',
   NESTED PATH '$.books[*]' COLUMNS (
     title text PATH '$.title', author text PATH '$.author'))) AS json_table_output
WHERE my_favorite_books. book_collection_id = 1
ORDER BY 1, 2, 3, 4;

 key |     kind      |             title              |           author
-----+---------------+--------------------------------+----------------------------
   1 | mystery       | The Count of Monte Cristo      | Alexandre Dumas
   1 | mystery       | Crime and Punishment           | Fyodor Dostoevsky
   2 | drama         | Anna Karenina                  | Leo Tolstoy
   3 | poetry        | Masnavi                        | Jalal al-Din Muhammad Rumi
   4 | autobiography | The Autobiography of Malcolm X | Alex Haley
(5 rows)

-- a simple multi-shard query, where we want to see all the books
SELECT json_table_output.* FROM
my_favorite_books,
JSON_TABLE ( jsonb_column, '$.favorites[*]' COLUMNS (
   key FOR ORDINALITY, kind text PATH '$.kind',
   NESTED PATH '$.books[*]' COLUMNS (
     title text PATH '$.title', author text PATH '$.author'))) AS json_table_output
ORDER BY 1, 2, 3, 4;

 key |     kind      |             title              |           author
-----+---------------+--------------------------------+----------------------------
   1 | mystery       | The Count of Monte Cristo      | Alexandre Dumas
   1 | mystery       | Crime and Punishment           | Fyodor Dostoevsky
   1 | mystery       | Our Mutual Friend              | Charles Dickens
   1 | mystery       | To Kill a Mockingbird          | Harper Lee
   2 | drama         | Anna Karenina                  | Leo Tolstoy
   2 | drama         | Pride and Prejudice            | Jane Austen
   3 | poetry        | Masnavi                        | Jalal al-Din Muhammad Rumi
   3 | poetry        | The Odyssey                    | Homer
   4 | autobiography | The Autobiography of Malcolm X | Alex Haley
   4 | autobiography | The Diary of a Young Girl      | Anne Frank
(10 rows)

-- more complex router query involving LATERAL and LIMIT
-- select two books under book_collection_id = 2
SELECT sub.*
FROM my_favorite_books,
lateral(SELECT * FROM JSON_TABLE (jsonb_column, '$.favorites[*]'
    COLUMNS (key FOR ORDINALITY, kind text PATH '$.kind',
        NESTED PATH '$.books[*]' COLUMNS
            (title text PATH '$.title', author text PATH '$.author')))
    AS json_table_output ORDER BY key DESC LIMIT 2) as sub
WHERE my_favorite_books.book_collection_id = 2;

 key |     kind      |           title           |   author
-----+---------------+---------------------------+------------
   4 | autobiography | The Diary of a Young Girl | Anne Frank
   3 | poetry        | The Odyssey               | Homer
(2 rows)

Furthermore, JSON_TABLE() can be on the inner part of an outer join, as well as in the outer part of a join as long as there is one distributed table (or even more). The limitations of using JSON_TABLE() in distributed queries are the same as the general limitations of the usage of recurring tuples in distributed queries. For more technical examples on usages of JSON_TABLE() in distributed queries, as well as the limitations, you can check out the Updates page.

Propagate "MERGE ... WHEN NOT MATCHED BY SOURCE" syntax

As you may know, the MERGE statement in SQL is used to perform INSERT, UPDATE, and DELETE operations on a target table based on the results of a join with a source table. This allows for efficient data synchronization between the target and source tables because it combines multiple operations into one.

PG15 added support for MERGE, with the syntax originally allowing only defining actions for rows that exist in the data source, but not in the target relation., i.e. WHEN NOT MATCHED BY TARGET.

As of PG17, one may use the MERGE command to operate on rows that exist in the target relation, but not in the data source, by using WHEN NOT MATCHED BY SOURCE. This is a fantastic addition and will greatly simplify various data loading and updating processes, because if a row in the target table being merged doesn’t exist in the source table, we can now perform any necessary actions on that row. Citus extended its already existing strategies employed for handling MERGE in a distributed environment to include this syntax as well. For more details, you can look at how Citus 12 supports MERGE.

Let’s see a simple example, similar to the tests in Postgres, on how to make use of MERGE ... WHEN NOT MATCHED BY SOURCE with Citus managed tables:

-- create and distribute the target and source tables
CREATE TABLE target_table (tid integer, balance float, val text);
CREATE TABLE source_table (sid integer, delta float);
SELECT create_distributed_table('target_table', 'tid');
SELECT create_distributed_table('source_table', 'sid');

-- populate the tables
INSERT INTO target_table SELECT id, id * 100, 'initial' FROM generate_series(1,5,2) AS id;
INSERT INTO source_table SELECT id, id * 10 FROM generate_series(1,4) AS id;

-- Use WHEN NOT MATCHED BY SOURCE
MERGE INTO target_table t
    USING source_table s
    ON t.tid = s.sid AND tid = 1
    WHEN MATCHED THEN
        UPDATE SET balance = balance + delta, val = val || ' updated by merge'
    WHEN NOT MATCHED BY TARGET THEN
        INSERT VALUES (sid, delta, 'inserted by merge')
    WHEN NOT MATCHED BY SOURCE THEN
        UPDATE SET val = val || ' not matched by source';

-- see the updated distributed target table
SELECT * FROM target_table ORDER BY tid;

 tid | balance |              val
-----+---------+-------------------------------
   1 |     110 | initial updated by merge
   2 |      20 | inserted by merge
   3 |      30 | inserted by merge
   3 |     300 | initial not matched by source
   4 |      40 | inserted by merge
   5 |     500 | initial not matched by source
(6 rows)

Expanded functionality on distributed partitioned tables

PG17 has expanded functionality for managing data in partitions. Now you can specify an access method for partitioned tables. Also, you can add exclusion constraints on partitions. Another great addition is supporting identity columns in partitioned tables. Citus has extended the distributed tables capabilities to include these 3 amazing functionalities for distributed partitioned tables as well! Let’s dive in a bit more detail below:

Citus 13.0 allows specifying an access method for distributed partitioned tables: After specifying a table access method via CREATE TABLE ... USING for a partitioned table, you can then distribute it through the Citus signature function: create_distributed_table(). From that point forward, this table will be managed by Citus with the specified access method. Also, Citus propagates ALTER TABLE ... SET ACCESS METHOD to all the nodes in the cluster, allowing to not only specify the access method for the distributed partitioned table, but also modify it.
Adds support for identity columns in distributed partitioned tables: Citus on Postgres 17 allows specifying generated identity columns for Citus managed tables by maintaining generated identity logic while propagating distributed partitioned table DDL to all the cluster nodes. For more details, check out how Citus 11.2 introduced identity column support for Citus managed tables.
Allows exclusion constraints on distributed partitioned tables: Similarly, Citus now allows adding an exclusion constraint because it seamlessly propagates the ALTER TABLE distributed_partitioned_table ADD CONSTRAINT ... SQL command to all the nodes in the cluster.

Let’s demonstrate all of the above with examples below:

-- let's say we are at node 0
-- create a partitioned table
-- specify access method as columnar, use generated identity column
CREATE TABLE dist_partitioned_table
( id_test bigint GENERATED BY DEFAULT AS IDENTITY (START WITH 10 INCREMENT BY 10),
n int )
PARTITION BY RANGE (n)
USING columnar;

-- create a partition for the table
CREATE TABLE pt_1 PARTITION OF dist_partitioned_table FOR VALUES FROM (1) TO (50);

-- distribute the table, making it a distributed partitioned table
SELECT create_distributed_table('dist_partitioned_table', 'id_test');

-- create another partition of the table, it will be automatically distributed
CREATE TABLE pt_2 PARTITION OF dist_partitioned_table FOR VALUES FROM (50) TO (1000);

-- Altering an access method for a partitioned table lets the value be used
-- for all future partitions created under it.
-- Existing partitions are not modified.
ALTER TABLE dist_partitioned_table SET ACCESS METHOD heap;

-- Add an exclusion constraint, which will be part of current and future partitions as well
ALTER TABLE dist_partitioned_table ADD EXCLUDE USING btree (id_test WITH =, n WITH =);

-- Attaching a partition inherits the identity column from the parent table
CREATE TABLE pt_3 (id_test bigint not null, n int);
ALTER TABLE dist_partitioned_table ATTACH PARTITION pt_3 FOR VALUES FROM (1000) TO (2000);

-- verify that the identity column is inherited in all children
SELECT attrelid::regclass, attname, attidentity FROM pg_attribute
WHERE attname = 'id_test' AND attidentity = 'd' ORDER BY 1;

        attrelid        | attname | attidentity
------------------------+---------+-------------
 dist_partitioned_table | id_test | d
 pt_1                   | id_test | d
 pt_2                   | id_test | d
 pt_3                   | id_test | d
(4 rows)

-- the parent table and the new partition have the altered access method "heap"
-- whereas the old two partitions have the original access method "columnar"
SELECT relname, amname FROM pg_class c LEFT JOIN pg_am am ON (c.relam = am.oid)
WHERE relname IN ('dist_partitioned_table', 'pt_1', 'pt_2', 'pt_3') ORDER BY relname;

        relname         |  amname
------------------------+----------
 dist_partitioned_table | heap
 pt_1                   | columnar
 pt_2                   | columnar
 pt_3                   | heap
(4 rows)

-- verify that the distributed partitioned table and its distributed partitions
-- have exclude constraints
SELECT conname FROM pg_constraint
WHERE conname LIKE '%id_test%' ORDER BY 1;

           conname
------------------------
dist_partitioned_table_id_test_n_excl
 pt_1_id_test_n_excl
 pt_2_id_test_n_excl
 pt_3_id_test_n_excl
(4 rows)

-- now, verify correct propagation to all the nodes in the cluster
\c - - - :node_1_port
SELECT attrelid::regclass, attname, attidentity FROM pg_attribute
WHERE attname = 'id_test' AND attidentity = 'd' ORDER BY 1;

        attrelid        | attname | attidentity
------------------------+---------+-------------
 dist_partitioned_table | id_test | d
 pt_1                   | id_test | d
 pt_2                   | id_test | d
 pt_3                   | id_test | d
(4 rows)

SELECT relname, amname FROM pg_class c LEFT JOIN pg_am am ON (c.relam = am.oid)
WHERE relname IN ('dist_partitioned_table', 'pt_1', 'pt_2', 'pt_3') ORDER BY relname;

        relname         |  amname
------------------------+----------
 dist_partitioned_table | heap
 pt_1                   | columnar
 pt_2                   | columnar
 pt_3                   | heap
(4 rows)

-- this node is not a coordinator
-- so we can also see the exclusion constraints on the shards
SELECT conname FROM pg_constraint
WHERE conname LIKE '%id_test%' ORDER BY 1;

           conname
------------------------
dist_partitioned_table_id_test_n_excl
dist_partitioned_table_id_test_n_excl_102008
....
 pt_1_id_test_n_excl
 pt_1_id_test_n_excl_102040
...
 pt_2_id_test_n_excl
 pt_2_id_test_n_excl_102072
...
 pt_3_id_test_n_excl
 pt_3_id_test_n_excl_102104
...

Propagate new EXPLAIN options: MEMORY and SERIALIZE

EXPLAIN in PG17 now includes two new options: SERIALIZE and MEMORY. SERIALIZE option investigates the real cost of converting the query's output data into displayable form and the cost of sending the data to the client, whereas the MEMORY option reports planner memory consumption. Citus 13.0 allows these options when trying to explain a distributed query.

As you may know, Citus distributes the query to the appropriate nodes that contain the shards that the query is referring to. Let’s refer to these as tasks sent to shards. For each of those shard tasks, it will return the explain output to the node that runs the EXPLAIN query. As a start in Citus, the MEMORY option will be especially useful for parallelized queries across shards to see the amount of memory used in a single task. The SERIALIZE option is useful in the collecting node, because after retrieving the whole data of the query, the serialize time can be properly calculated. Let’s see a simple example below:

-- create, distribute and populate a simple table
SET citus.shard_count TO 32;
CREATE TABLE dist_table(a int, b int);
SELECT create_distributed_table('dist_table', 'a');
INSERT INTO dist_table SELECT c, c * 10000 FROM generate_series(0, 1000) AS c;
-- explain a simple multi-shard query on the table using memory and serialize options
EXPLAIN (costs off, analyze, serialize, memory) SELECT * FROM dist_table;

                                      QUERY PLAN
--------------------------------------------------------------------------------------------
Custom Scan (Citus Adaptive) (actual time=18.490..18.519 rows=1001 loops=1)
Task Count: 32
Tuple data received from nodes: 8008 bytes
Tasks Shown: One of 32
-> Task
    Tuple data received from node: 272 bytes
     Node: host=localhost port=9702 dbname=Naisila
     ->  Seq Scan on dist_table_102141 dist_table (actual time=0.013..0.016 rows=34 loops=1)
         Planning:
           Memory: used=7kB  allocated=8kB
         Planning Time: 0.024 ms
         Serialization: time=0.000 ms  output=0kB  format=text
         Execution Time: 0.031 ms
Planning:
Memory: used=359kB allocated=512kB
Planning Time: 0.287 ms
Serialization: time=0.097 ms output=20kB format=text
Execution Time: 18.902 ms
(18 rows)

This EXPLAIN query is showing one of 32 tasks, where tasks correspond with shards. We can see the amount of memory consumed in a single task in the node where it was executed. In the example above, there is more memory consumed in the coordinator node because the Custom Scan node is coalescing results from all the tasks. Serialization value is shown for the collected results only, for now.

Leveraging optimizer improvements with Citus in Postgres 17

As soon as we enable PG17 in Citus, we can make use of query and optimizer improvements without any further action needed. That’s because such improvements are reflected directly in Citus table shards, which are essentially regular Postgres tables. Thanks to PG17 enabling correlated subqueries to be pulled to a join, Citus can make use of this feature in its distributed planning phase and run even more types of distributed queries, which were not supported with previous PG versions.

PG17 has several commits that bring significant optimizer improvements. This commit in particular: Allow correlated IN subqueries to be transformed into joins, is worth calling out because it enables Citus 13.0 to plan and execute a query with a correlated IN subquery using query pushdown, where it was challenging for pre-PG17 Citus to plan the query. Let’s see what type of new queries in Citus 13.0 with PG17 we can run:

-- create, distribute and populate two simple tables
CREATE TABLE customer ( id int, name text, contact text, category text);
CREATE TABLE orders ( customer_id int, category text);
SELECT create_distributed_table('customer', 'id');
SELECT create_distributed_table('orders', 'customer_id');
INSERT INTO customer VALUES (1, 'Beana', 'beana1234@gmail.com', 'books'),
                            (2, 'Erida', 'erida1234@gmail.com', 'notebooks'),
                            (3, 'Redi', 'redi1234@gmail.com', 'pens');
INSERT INTO orders VALUES (1, 'books'), (2, 'notebooks'), (3, 'hats');

-- with Citus 13.0 in PG17 we are able to run a query
-- on the customer table that has a correlated subquery!
SELECT c.name, c.contact
FROM customer c
WHERE c.id in (SELECT customer_id FROM orders o WHERE o.category = c.category);

name  |       contact
------+---------------------
Beana | beana1234@gmail.com
Erida | erida1234@gmail.com
(2 rows)

-- pre Citus 13 or pre PG17 this query would fail with the following
ERROR:  complex joins are only supported when all distributed tables
        are co-located and joined on their distribution columns

Let’s get into more details on how Citus 13.0 leverages PG17 optimizer improvements and is able to plan the query and execute it:

EXPLAIN (costs off)
SELECT c.name, c.contact
FROM customer c
WHERE c.id in (SELECT customer_id FROM orders o WHERE o.category = c.category);

With Citus 13.0 the plan for this query is:

                              QUERY PLAN
--------------------------------------------------------------------------
Custom Scan (Citus Adaptive)
   Task Count: 32
   Tasks Shown: One of 32
   -> Task
    Node: host=localhost port=9701 dbname=citus
    -> Hash Join
        Hash Cond: ((c.category = o.category) AND (c.id = o.customer_id))
        -> Seq Scan on customer_105861 c
        -> Hash
            -> HashAggregate
        Group Key: o.category, o.customer_id
            -> Seq Scan on orders_105893 o

The Postgres 17 planner converts the IN subquery to a join between the customer and `orders table (technically it is a semi-join). Then, Citus can push down the query to all worker nodes because the join includes an equality on the distribution columns of the tables. In contrast, the same query hits a planning limitation with previous versions of Citus:

-- Pre-13.0 Citus:
EXPLAIN (costs off)
SELECT c.name, c.contact
FROM customer c
WHERE c.id in (SELECT customer_id FROM orders  o WHERE o.category = c.category);
DEBUG:  skipping recursive planning for the subquery since it contains
        references to outer queries
ERROR:  complex joins are only supported when all distributed tables
        are co-located and joined on their distribution columns

Prior to version 17, Postgres planned the subquery as a correlated Subplan and applied that as a filter on the customer table. With Citus, if it is not possible to push down a correlated subquery. But with Postgres 17 the subquery is planned as a join, the query plan has no correlated Subplans, and Citus can naturally pushdown this join!

Important bugfixes, including community contributions into Citus 13.0

Citus 13.0 has bug fixes that address some crashes caused by unsafe catalog access, and segmentation faults in distributed procedures. Citus 13.0 also resolves issues related to role synchronization across nodes, server crashes in specific cluster configurations, and improves handling of shard placement when new nodes are introduced without required reference data.

Other than work from Citus engineers, we have seen significant community contributions to Citus, which we always love to see. We are really grateful for all the contributions to the Citus open-source repository in GitHub, both pull requests and issues. We would like to thank:

Cédric Villemain for his contributions in fault tolerance by fixing a segfault when calling distributed procedure with a parameterized distribution argument.
Karina for her contributions in crash tolerance by fixing a server crash when trying to execute activate_node_snapshot() on a single-node cluster.
Filip Sedlák for his contributions in cluster management & coordination by improving citus_move_shard_placement() to fail early if there is a new node without reference tables yet.

For more details on these community contributions, and more notable fixes, you can check the Citus 13.0 Updates page.

Diving deeper into Citus 13.0 and distributed Postgres

To learn more about Citus 13.0, you can:

Check out the 13.0 Updates page to get the detailed release notes.
Watch the replay of the Citus 13.0 Release Party livestream to see demos and learn more about how Citus 13.0 distributes PostgreSQL 17, as well as exciting Citus team updates.

You can also stay connected on the Citus Slack and visit the Citus open source GitHub repo to see recent developments as well. If there’s something you’d like to see next in Citus, feel free to also open a feature request issue :)

This article was originally published on citusdata.com.

CFP talk proposal ideas for POSETTE: An Event for Postgres 2025

Claire Giordano — 2025-02-04T21:02:01+00:00

Some of you have been asking for advice about what to submit to the CFP for POSETTE: An Event for Postgres 2025. So this post aims to give you ideas that might help you submit a talk proposal (or 2, or 3) before the upcoming CFP deadline.

If you’re not yet familiar with this conference, POSETTE: An Event for Postgres 2025 is a free & virtual developer event now in its 4th year, organized by the Postgres team at Microsoft.

I love the virtual aspect of POSETTE because the conference talks are so accessible—for both speakers and attendees. If you’re a speaker, you don’t need travel budget $$—and you don’t have to leave home. Also, the talk you’ve poured all that energy into is not limited to the people in the room, and has the potential to reach so many more people. If you’re an attendee, well, all you need is an internet connection.

The CFP for POSETTE: An Event for Postgres will be open until Sunday Feb 9th at 11:59pm PST. So as of the publication date of this blog post, you still have time to submit a CFP proposal (or 2, or 3, or 4)—and to remind your Postgres teammates and friends of the speaking opportunity.

If you have a Postgres experience, success story, failure, best practice, “how-to”, collection of tips, lesson about something that's new, or deep dive to share—not just about the core of Postgres, but about anything in the Postgres ecosystem, including extensions, and tooling, and monitoring—maybe you should consider submitting a talk proposal to the CFP for POSETTE.

If you’re not sure about whether to give a conference talk, there are a boatload of reasons why you should. And there’s also a podcast episode with Álvaro Herrera, Boriss Mejías, and Pino de Candia that makes the case for why giving conference talks matters. For inspiration, you can also take a look at the playlist of POSETTE 2024 talks.

And if you’re looking for even more CFP ideas, you’ve come to the right place! Read on…

Ideas for talks you might propose in the POSETTE CFP

On the CFP page there is a list of possible talk titles (screenshot below) you might submit—these are good ideas, although the list is by no means exhaustive, and we welcome talk proposals that are not on this list.

Figure 1: POSETTE CFP talk topics taken from the CFP page on PosetteConf.com

On Telegram the other day, when answering the question “Do you have any ideas of what I should submit?”, I found myself suggesting different TYPES of talks. Not specific ideas and talk titles, but rather I framed the different categories. So I decided to share these different “types” and “classes” of talks with all of you, in the hopes this might give you a good talk proposal idea.

First you need to pick your audience: Before you think about what type of talk to give, remember that the POSETTE team is focused on serving the needs of both the USER community—as well as the Postgres contributor & hacker communities.

That means first you need to decide on your audience. Are you giving a talk for PostgreSQL users, or Azure Database for PostgreSQL customers, or the PostgreSQL contributor community? All are good choices.

Then you need to decide: what do you want to accomplish with your talk?

Do you want to skill up the Postgres hacker community?: If you want to help skill-up the developer/contributor community, maybe pick a part of Postgres that new contributors often ask a lot of questions about, get stuck on, need help with, etc—and give a “tour” of its mechanics, starting with the basics.
Do you want to help grow the Postgres community?: If you want to help grow the Postgres community of contributors and developers, you could propose a talk that would motivate tomorrow's developers/contributors to get involved in the project. Imagine you were going to a university to give a talk about "why work on Postgres"… what would you say? And how would you entice people to work on Postgres?

What pain points would you challenge them with?

What benefits would you share from your own Postgres experience that might inspire these developers to think seriously about Postgres as a career path?

You could also shine a light on the different ways people can (and do!) contribute to the Postgres community: from mentoring to translations to organizing conferences to podcasts to speaking at conferences to publishing PostgreSQL Person of the Week.
Do you want to share your expertise with Postgres users?: If you want your talk to benefit users, maybe pick an area that you are already expert in (or want an excuse to dig into and learn about?) and create a Beginners Guide for it? Or Advanced Tips for it? Or Surprising Benefits of? Or Things People Might Not Know?

Especially if there is a part of Postgres you feel like people sometimes mis-use, or don't take enough advantage of....
Do you want to share your customer experiences with Azure Database for PostgreSQL, or Postgres more generally?: Maybe you have a wild success story you think others will benefit from. Or you want to share a problem you had and how you used Postgres to solve it? People love customer stories.
Do you want to shine a light on the broader Postgres ecosystem?: If you want to target users with your talk, don’t limit yourself the Postgres core. There is a rich ecosystem that surrounds Postgres and people need to understand the ecosystem, too. So maybe there are tools or Postgres extensions or forks or startups that you can give a useful talk about?
Do you want to help experts in other database technologies learn about Postgres?: If you have expertise in other databases as well as Postgres, maybe you can help people who who are skilled in running workloads on other databases and are looking to skill up on Postgres—by helping them understand what’s similar, and what’s different. As if you’re giving them a dictionary to translate from their familiar database to Postgres, and vice versa.
There are so many more possibilities: Often I look at the schedule from previous years to look for inspiration (and to make sure that my talk proposal is not a duplicate of a talk that’s already been given.) And I think about pain points, things people get confused about, or questions that come up a lot. Another thing to keep in mind: how can you help your story to "stick"? Can you make it entertaining? How do you share your story in a way that keeps people watching (versus looking at their phone instead?)

Key things to know about POSETTE: An Event for Postgres 2025

CFP deadline: The CFP for POSETTE will close on Sunday, Feb 9th 2025 @ 11:59pm Pacific Time (PST)
No travel required: free & virtual developer event
Length of talks: 25 minutes/session
Language: All talks will be in English
Talks will be pre-recorded: All talks will be pre-recorded by the POSETTE team during the weeks of Apr 28th and May 5th (with accepted speakers presenting remotely)
When is the event?: Jun 10-12, 2025
Format of the virtual event: All pre-recorded talks will be livestreamed in one of 4 unique livestreams on Jun 10-12, 2025—all with parallel live text chats on Discord. Two of the livestreams will be in Americas-friendly times of day (8:00am-2:00pm PDT) and two of the livestreams will be in EMEA-friendly times of day (8:00am-2:00pm CEST). All talks will be published online after the event is over.
More info about the CFP: All the details, including key dates and how to submit on Sessionize, are spelled out on the CFP page for POSETTE 2025
Code-of-conduct: You can find the Code of Conduct for POSETTE online. Please help us to provide a respectful, friendly, and professional experience for everybody involved in this virtual conference.

Figure 2: the CFP is open for POSETTE: An Event for Postgres 2025 until Sunday Feb 9th at 11:59pm PST. What Postgres story do you want to share?

This article was originally published on citusdata.com.

Say hello to the Talking Postgres podcast

Claire Giordano — 2024-07-09T15:15:01+00:00

The TL;DR of this blog post is simple: the “Path To Citus Con” podcast for developers who love Postgres has been renamed—and the new name is Talking Postgres.

And if you’re just hearing about the Talking Postgres podcast for the first time, it is a monthly podcast for developers who love Postgres, with amazing guests from the Postgres world who talk about the human side of Postgres, databases, and open source.

Listening to the Talking Postgres podcast is the next best thing to being in the hallway at a Postgres conference, eavesdropping on other people’s conversations and learning from the experiences of experts. As Floor Drees says, it’s as if you’re sharing a coffee with them.

Past podcast guests include (in order of appearance) some amazing Postgres, database, and open source people such as: Simon Willison, Marco Slot, Abdullah Ustuner, Burak Yucesoy, Melanie Plageman, Samay Sharma, Álvaro Herrera, Boriss Mejías, Thomas Munro, Grant Fritchey, Ryan Booz, Chelsea Dole, Floor Drees, Paul Ramsey, Regina Obe, Andres Freund, Heikki Linnakangas, Dimitri Fontaine, Vik Fearing, Lukas Fittl, Rob Treat, Jelte Fennema-Nio, Derk van Veen, Arda Aytekin, Chris Ellis, Michael Christofides, Aaron Wislang, and Teresa Giacomini. The podcast is produced by the Postgres team at Microsoft—and I have the privilege of being your host.

So whether you’re an existing listener or new to this podcast, we hope you enjoy the Talking Postgres podcast—and help to spread the word about the new name.

Figure 1: The new “Talking Postgres with Claire Giordano” podcast name (formerly called Path To Citus Con) is depicted here with the same elephant mascot we’ve always used.

Some key things to know about the Talking Postgres podcast

Why did we rename the podcast?: Guests & friends repeatedly—and I mean repeatedly—nudged us to rename the podcast to a name that makes it more clear what the podcast is about. And at the end of the day it’s about Postgres things! So while the podcast was born in March 2023 as a pre-event to last year’s Citus Con—hence the original name, “Path To Citus Con”—Talking Postgres has grown into its own monthly podcast that has everything to do with Postgres and little to do with Citus Con (now called POSETTE.)
Where can you catch up on past podcast episodes?: All 16 of the past episodes of Path To Citus Con can now be found on the talkingpostgres.com site—as well as on the Talking Postgres playlist on YouTube, and wherever you get your podcasts.
If you’re already subscribed to the podcast, are you still subscribed?: We renamed the previous podcast, so if you were already subscribed, you should still be subscribed. Same thing for the RSS feed, it should just work! If you have any problems, please let us know via the #talkingpostgres channel in the Microsoft Open Source Discord.
Can you still attend the LIVE recording of the podcast on Discord each month?: Yes! Inspired by the Oxide and Friends podcast that is hosted by Bryan Cantrill and Adam Leventhal—two of my former teammates from the kernel group at Sun Microsystems—we also record Talking Postgres (formerly Path To Citus Con) each month on Discord—with a parallel live text chat that is quite fun to be part of.
When are the future LIVE podcast recordings: If you’ve never participated in this type of live podcast recording, you might want to give it a try. It’s easy to subscribe to the Talking Postgres calendar of future LIVE podcast recordings: we usually record on Wed mornings Pacific Time (PT) on the 1^st or 2^nd Wednesday of the month.
Episode 17 of Talking Postgres will be recorded LIVE this Wed July 10, 2024: This July, our guest is Pino de Candia—the former co-host of the podcast—and the topic will be a bit “meta” this time! We’ll explore “Podcasting about Postgres” and we’ll look back at some of our greatest hits, talk about some of the other wonderful Postgres podcasts we listen to, and of course we’ll spend a few minutes reflecting on the podcast rename (why why why!) This Ep17 calendar invite should give you all the instructions you need to join us live on Discord this Wed Jul 10th at 10:00am PDT.
And David Rowley is scheduled to be the guest on the August episode: Since David Rowley—a Postgres committer—is based in New Zealand, Ep18 in August with David Rowley will be recorded at an unusual time for us, on a Tuesday, specifically at 4:00pm PDT on Tue Aug 6th. The topic will be “How I got started as a developer & in Postgres” and we hope you can join us on the parallel live text chat! David is brilliant—and I’m definitely going to have to do my homework on Postgres performance topics to prep for the conversation, since that is one of David’s specialties.

Let us know what you think of the podcast, be sure to use hashtag #TalkingPostgres

The new hashtag for the new podcast name is #TalkingPostgres and as soon as we see some interesting tweets, toots, and threads about the podcast using the new name, perhaps we’ll add some of them to the talkingpostgres.com website.

For now, here is a screenshot highlighting guests and topics for some of the most recent podcast episodes. You can of course subscribe to Talking Postgres and listen from anywhere, wherever you get your podcasts from. Enjoy!

Figure 2: Screenshot of the Talking Postgres web page at talkingpostgres.com, showing the most recent 5 episodes.

This article was originally published on citusdata.com.

Ultimate Guide to POSETTE: An Event for Postgres, 2024 edition

Claire Giordano — 2024-06-05T06:06:00+00:00

Now in its 3rd year, POSETTE: An Event for Postgres 2024 is not only bigger than previous years but some of my Postgres friends who are speakers tell me the event is even better than past years. Sweet.

Formerly called Citus Con (yes, we did a rename), POSETTE is a free and virtual developer event happening Jun 11-13 that is chock-full of Postgres content—with 4 livestreams, 42 talks, and 44 speakers.

And while POSETTE is organized by the Postgres team here at Microsoft, there is a lot of PG community involvement. For example, 31 of the 44 speakers (~70%) are from outside Microsoft! We have also tried to be quite transparent about the talk selection process used for POSETTE 2024, if you’re curious.

On the schedule, the add to calendar links (in upper right of each livestream's tab) are quite useful for blocking your calendar—and the calendar appointments include a link to where you can watch the livestreams on the POSETTE site.

So what exactly is on the schedule for POSETTE: An Event for Postgres 2024? A lot! When you look at the schedule page, be sure to check out all 4 tabs, so you don’t miss all the unique talks in Livestreams 2, 3, and 4.

Figure 1: Screenshot of the Schedule page for POSETTE: An Event for Postgres 2024, with arrows pointing to the different tabs for each livestream.

There’s something “accessible” about virtual events

As much as many of us 🧡 in-person Postgres events—I just returned from PGConf.dev 2024 in Vancouver which was so much fun¹—I am also a big fan of the accessibility of the virtual format.

Why? Because not everybody can travel to in-person conferences: Not everyone has the budget (or the time, or the schedule flexibility). So it’s rewarding to collaborate with all these knowledgeable speakers to produce video talks you can watch from the comfort of your very own desk. With espresso (or tea) in hand.

Speaking of accessibility, let's talk captions. All the talk videos published on YouTube will have captions available in 14 languages. The POSETTE team QA’s and fixes the English captions first and then generates the translated captions based on the improved baseline version.

Captions available in 14 languages: Chinese Simplified, Chinese Traditional, Czech, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, and Turkish

So much gratitude for all these POSETTE speakers

The raison d’être for most conferences is to learn from each other—and in particular, to learn from the speakers. So before diving into this Ultimate Guide we want to say thank you to all of the amazing the speakers who have brought so much knowledge and enthusiasm to POSETTE this year.

Figure 2: Headshots from the amazing Postgres experts who are speakers in this year’s event.

TIP about Speaker Interviews: Some of the speakers submitted speaker interviews. So if you want to know more about these Postgres experts, just click on the links to their talks below and you’ll be taken to their speaker page, which includes the talk abstract, a link to the livestream schedule, links to any past POSETTE talks—plus the speaker interview, if they submitted one!

Gratitude for our Livestream hosts too

There is a fun lineup of Postgres people from inside and outside Microsoft to co-host the 4 livestreams, including Boriss Mejías, Claire Giordano (that’s me) Floor Drees, Jelte Fennema-Nio, Krishnakumar (KK) Ravi, Melanie Plageman, and Pino de Candia.

Big +1 to all these folks for volunteering their time and their commentary to hosting the livestreams.

How to watch the POSETTE livestreams?

Each of the 4 livestreams has its own Add to calendar link which should be easy to find on that Livestream’s tab on the POSETTE Schedule.

And when you use the Add to calendar links to block your calendar, the calendar invites include instructions on:

WHERE to watch the livestreams: POSETTE event homepage
- TIP: be sure to reload the POSETTE homepage on the day of the event, so you’re not viewing some cached older version of the page or a replay of a previous livestream.
WHERE to check out the schedule = POSETTE schedule page

How to participate in the official back-channel for POSETTE?

WHERE to join the virtual hallway track = #posetteconf channel on the Microsoft Open Source Discord.

You can join the channel now, no need to wait, that way you’ll be all setup before the conference.

What’s the benefit to you of joining the virtual hallway track, also known as the “official back-channel”? Most of the speakers have signed up to be available on the virtual hallway track whilst their talks are being livestreamed. So it’s your opportunity to ask speakers questions—and to be part of the conversation, and make some Postgres friends.

Figure 3: Join the virtual hallway track before, during, and after the Livestreams to be part of the POSETTE conversation. Most of the speakers have signed up to be on the #posetteconf channel whilst their talk is being livestreamed so it's a good chance to ask questions and go deeper.

Will the recordings be available after the event?

Yes! All the POSETTE talks will be published online on YouTube after the event is over, so you can watch all 42 talks at your convenience, on your own schedule—even at 2X speed. With captions available in at least 14 languages.

Still, we hope you join us during the actual livestreams to be part of the live text chat.

Now let’s explore the different categories of talks you’ll see at POSETTE: An Event for Postgres 2024.

Figure 4: Overview of the different categories of talks in POSETTE: An Event for Postgres 2024, a free and virtual developer event full of Postgres talks—from the nerdy to the sublime.

4 amazing Keynotes, one for each livestream

Each of the 4 POSETTE livestreams has its own invited keynote. The keynote speakers were hand-selected and invited to come tell their story at POSETTE. We're so happy they said yes—and that they will be sharing their open source, developer, cloud, and community experiences with us.

Livestream 1 Keynote by Charles Feddersen: All The Postgres Things at Microsoft, POSETTE edition
Livestream 2 Keynote by Regina Obe: The Open Source Geospatial Community, PostGIS, & Postgres
Livestream 3 Keynote by Sarah Novotny: Why I love open source development & what I learned from K8s
Livestream 4 Keynote by Thomas Munro: A Walking Tour of PostgreSQL

7 AI-related talks

With everything happening with pgvector and Postgres, it’s not a surprise that there was a plethora of AI talk submissions at this year’s POSETTE. In the end, we landed on 7 AI-related talks.

In alphabetical order by talk title:

6 things you can do with azure_ai & PostgreSQL on Azure, by Denzil Ribeiro (livestream 3, AI, azure_ai, AzureDBPostgres, Flexible Server, pgvector)
Advancing Drug Search with PostgreSQL and Azure AI, by Taras Kloba (livestream 4, ai, azure_ai, AzureDBPostgres, customer)
From Postgres full text search to Retrieval Augmented Generative search, by Adam Wølk (livestream 2, ai, AzureDBPostgres, rag)
pgvector for Python developers, by Pamela Fox (livestream 3, ai, ecosystem, pgvector, python, orm, psycopg, demo)
Postgres-powered AI: Running an End-to-End AI Platform with Postgres on Azure, by Jaehyun Sim of Ikigai Labs (livestream 1, ai, AzureDBPostgres, customer)
Semantic search with Django, PostgreSQL, & pgvector, by Paolo Melchiorre of 20tab (livestream 4, AI, django, ecosystem, pgvector, python)
Vector data in Postgres - how's it different from “normal” data?, by Heikki Linnakangas of Neon (livestream 2, AI, data types, HNSW, postgres, pgvector)

15 Postgres core talks

Lots of Postgres goodness here. In alphabetical order by talk title:

Beyond Joins and Indexes, by Bruce Momjian of EDB (livestream 3, prolific presenter, query optimizer, tips)
Even JSONB In Postgres Needs Schemas, by Chris Ellis of Nexteam (livestream 2, customer, json, schemas)
Everything you need to know about Postgres Row Level Security, by Paul Copplestone of Supabase (livestream 1, security, startup)
Hazards of logical decoding in PostgreSQL, by Polina Bungina of Zalando SE (livestream 4, cdc, logical decoding)
How/Why to Sweep Async Tasks Under a Postgres Table, by Taylor Troesh (livestream 3, queues, Postgres as a platform)
Partitioning your Postgres tables for 20X better performance, by Derk van Veen of Adyen (livestream 4, customer, joins, partitioning, performance, tips)
Postgres Storytelling: What’s going on with Synchronous Replication?, by Boriss Mejías of EDB (livestream 2, Postgres storytelling, replication, WAL)
PostgreSQL Partitioning: Slicing and Dicing for Performance and Easier Maintenance, by Ryan Booz of Redgate (livestream 2, maintenance, partitioning, performance)
PostgreSQL performance tips you have never seen before V2.0, by Hans-Jürgen Schönig of CYBERTEC (livestream 4, performance, tips)
PostgreSQL physical replication - internals, latest development and opportunities, by Krishnakumar (KK) Ravi and Melih Mutlu (livestream 3, replication, WAL)
Revitalizing Outdated Data Models with PostgreSQL Views, by Newvick Lee of Careteam Technologies (livestream 3, data modeling, postgresql views)
Scaling the Wall of Text: Logging Best Practices in PostgreSQL, by Richard Yen of EDB (livestream 1, logging, performance, security, tips)
Tuning Parameters in Postgres vs. Tuning Your Queries, by Henrietta Dombrovskaya of DRW (livestream 1, performance, query tuning)
Where do the performance cliffs come from?, by Tomas Vondra of EDB (livestream 4, performance, performance cliffs, query optimizer)
You Don't Need a Database Backup Policy, by Karen Jex of Crunchy Data (livestream 4, backups, disaster recovery, DR)

8 Postgres ecosystem talks

“Ecosystem” is the “E” in the middle of POSETTE. In alphabetical order by talk title:

Accelerating PL/pgSQL Code Conversion When Migrating to Postgres, by Deepak Mahto (livestream 2, pl/pgsql, migrations, Oracle to Postgres)
Comparing Postgres connection pooler support for prepared statements, by Jelte Fennema-Nio (livestream 1, connection poolers, pgbouncer, odyssey, pgcat, supavisor)
Data-intensive PostgreSQL: Three ways to scale, by Marco Slot of Crunchy Data (livestream 2, analytics, citus, extensions, it depends, performance, pg_partman, SaaS, scalability)
The Open Source Geospatial Community, PostGIS, & Postgres, by Regina Obe of Paragon Corporation and PostGIS PSC (livestream 2, community, geospatial, KEYNOTE, postgis)
Lessons Learned from benchmarking and profiling distributed PostgreSQL, by Lotte Felius, a PhD Student in the Database Architectures group at CWI (livestream 2, benchmarking, citus, extensions, performance, ycsb)
SaaS on Rails on PostgreSQL, by Andrew Atkinson, author of High Performance PostgreSQL for Rails (livestream 1, citus, extensions, multi-tenancy, rails, ruby, saas)
State of the Postgres Extension Ecosystem, by David Wheeler of Tembo (livestream 1, extensions, pgxn)
Vindicating ZFS with PostgreSQL: Unleashing the Power of Scalability, by Federico Campoli (livestream 4, file system, storage, zfs, zol)

8 Azure Database for PostgreSQL talks

All The Postgres Things at Microsoft, POSETTE edition, by Charles Feddersen (livestream 1, AzureDBPostgres, cloud, community, conferences, KEYNOTE, open source)
Autotuning PostgreSQL on Azure Flexible Server, by Luigi Nardi of DBtune (livestream 4, AzureDBPostgres, machine learning, performance, query tuning, self-driving database)
HA and DR at a glance with Azure Database for PostgreSQL, by Silvano Coriani (livestream 2, AzureDBPostgres, DR, HA, multi-region DR, PaaS)
Making Postgres inserts faster on Azure, by Gayathri Paderla (livestream 1, AzureDBPostgres, performance)
Tales from the Field - Oracle to PostgreSQL migrations, by Adithya Kumaranchath (livestream 3, migrations, Oracle to Postgres)
Using Azure Query Store to Understand PostgreSQL Performance, by Grant Fritchey of Redgate (livestream 3, AzureDBPostgres, performance, query store, query tuning)
What Enterprises like about Azure Database for PostgreSQL – Flexible Server, by Kanchan Bharati (livestream 1, AzureDBPostgres, cloud, enterprise)
What Makes Azure Database for PostgreSQL Great for Developers?, by Varun Dhawan (livestream 3, AzureDBPostgres, cloud, developers, extensions, ecosystem)

4 Postgres community talks

A Walking Tour of PostgreSQL, by Thomas Munro (livestream 4, community, KEYNOTE, open source, postgres history, postgres contributor story)
How to Work with Other People, by Jimmy Angelakos and Floor Drees (livestream 2, collaboration, community, neurodiversity, open source, people)
Open Source Contributions to Postgres: The Basics, by Elizabeth Christensen of Crunchy Data (livestream 1, contributing to postgres, community)
Why I love open source development & what I learned from K8s, by Sarah Novotny (livestream 3, collaboration, community, kubernetes, open source)

Please tell your friends & mark your calendars for this year’s POSETTE

If you are into PostgreSQL and into continuous learning, join us on Jun 11-13. No travel required.

Livestream 1: on Tue Jun 11 from 8:00am – 2:00pm PDT (UTC -7) with 11 unique talks (add to calendar / link to Livestream 1 schedule)
Livestream 2: on Wed Jun 12 from 8:00am – 2:00pm CEST (UTC +2) with 11 unique talks (add to calendar / link to Livestream 2 schedule)
Livestream 3: on Wed Jun 12 from 8:00am – 1:30pm PDT (UTC -7) with 10 unique talks (add to calendar / link to Livestream 3 schedule)
Livestream 4: on Thu Jun 13 from 8:00am – 1:30pm CEST (UTC +2) with 10 unique talks (add to calendar / link to Livestream 4 schedule)
Virtual hallway track is in the #posetteconf channel on the Microsoft Open Source Discord. You’re invited to pop in, say hello, and ask questions.
Opportunities for cool SWAG: There will be chances to snag sticker swag for those who join the livestreams—plus a cloud skills challenge for those who are into Azure Database for PostgreSQL.

And if you do the social media thing and want to help spread the word about the POSETTE livestreams to your friends and networks, you can follow @PosetteConf on X/Twitter to stay connected—or you can follow us on Mastodon or also on Threads.

Final thank you’s

And while this blog post has already thanked the 44 speakers and 7 livestream co-hosts, it’s time to thank everyone at Microsoft involved in organizing POSETTE 2024 especially the organizing chair Teresa Giacomini. And immense gratitude to the talk selection team without whom we wouldn’t have this amazing roster of talks.

We hope you get a lot of value out of POSETTE: An Event for Postgres 2024. Happy learning!

Footnotes

Not only was PGConf.dev fun, but according to Postgres committer Peter Eisentraut the conference was so engaging that nothing happened in the Postgres code base for days. PGConf.dev appears to be the cause of the “longest hiatus [in Postgres commits] in over 20 years and the 4^th-longest of all time.” Says Peter Eisentraut, “It might never be this quiet again!” ↩

This article was originally published on citusdata.com.

About Talk Selection for POSETTE: An Event for Postgres 2024

Claire Giordano — 2024-04-22T17:42:20+00:00

As promised in the CFP for POSETTE: An Event for Postgres 2024, all of the talk selection decisions were emailed out on April 17^th. Our talk selection work has now concluded, with the possible exception of accepting proposals from the Reserve list.

So what’s next? First I want to thank all of you Postgres people who submitted such amazing talk proposals into the CFP for POSETTE, now in its 3rd year. I was so impressed by the submissions and wish we could have accepted more of them.

And I also want to thank Alicja Kucharczyk, Daniel Gustafsson, and Melanie Plageman from POSETTE’s Talk Selection Team for contributing their time and expertise to collaborate with me to select the talks for this year’s virtual POSETTE event. It’s not easy to carefully read through and review 184 talk proposals—in just 8 days—to come up with the program for an event like #PosetteConf.

That’s right, 184 talk proposals—from 120 unique speakers. (The CFP had a maximum of 4 submissions per speaker.) With just 38 talks to accept this year, that means POSETTE 2024 has a ~20% talk acceptance rate. Bottom line, we had some difficult decisions to make.

So many great talk proposals we had to lengthen the POSETTE schedule to make space

The original POSETTE plan for 2024 was to have 4 livestreams with 9 talks each. The math looked like this:

Each livestream would have:
- 1 invited keynote—not selected through the CFP talk selection process, but rather an invited keynote speaker
- 8 unique talks selected via the CFP process
Hence, 36 talks total:
- 32 talks selected via the CFP process + 4 unique keynotes

However, the best laid plans of mice and men and all that, we had to throw that math out the window. There were too many good talk proposals.

Luckily the talk production team led by Teresa Giacomini was able to rejigger their recording schedules to make room for 6 more talks.

So the final POSETTE 2024 schedule will have:

42 talks total: 38 talks selected via the CFP process + 4 unique keynotes

And yes, we’ve already started discussions to figure out how we could support more talks in the schedule next year, for POSETTE 2025.

Here is POSETTE’s “CFP hockey stick” as evidence that many are deadline driven

Below is a chart of the CFP submissions by Day, so those of you who are deadline driven (I’m guilty as charged too) will see you’re not alone. In fact, more than 50% of the talk proposals were submitted in the last week of the CFP.

Side-note: When a talk proposal was submitted had zero bearing on the talk selection process. We didn’t start reviewing and voting until after the CFP was closed and all proposals were in the system.

Figure 1: This chart shows the number of talk proposals submitted (by day) into the CFP for POSETTE: An Event for Postgres 2024, with a big hockey stick of talk proposals submitted in the last 2 days. The 184 talk proposals were submitted by 120 unique speakers.

Transparency into talk selection for Postgres conferences

Transparency into the process used for talk selection can be helpful for Postgres conference speakers which is why I’m writing this post. And in particular, it helps to remind yourself that many of the Postgres conference CFPs are competitive. So if your talk proposal was one of the submissions that was declined (or placed on the reserve list) for this year’s POSETTE, please remember:

even the best speakers get rejected sometimes
even great talk proposals get rejected sometimes
don’t give up: please continue to submit your talk proposals—and if you want to be accepted in the future, do the work to get feedback, and make sure the submissions are on point for whatever each event is looking for

Speaking of which, there are probably some Postgres CFPs that are open right now! One big one that comes to mind is PGConf NYC 2024, happening in midtown Manhattan from Sep 30^th to Oct 2^nd 2024. The PGConf NYC 2024 CFP will be open until Jun 5^th 2024.

About the 2-phase POSETTE talk selection process

Our talk selection team used a 2-phase process for doing talk selection.

During Phase 1

I sent the talk selection team guidance for talk selection, which was basically just a refresher about the CFP: what the goal of the event is, what types of talks and speakers we were looking for, reminders that new speakers are welcome, clarifying that it’s OK to accept a talk from a speaker who had spoken at a previous Citus Con... since after all, a good speaker is a good speaker.
We each (separately, individually) reviewed all 184 talk proposals, along with any other information and links the Speakers provided about the proposal in the Additional Notes sections and the Speaker Experience section.
We used the “Comparison evaluation method” in Sessionize to rank 3 sessions at a time. This technique is based on the Elo rating system used in the world of chess. What’s good about it is that at any given point, you’re only looking at 3 proposals. You don’t have to remember how you voted on similar talk proposals 5 hours beforehand. Rather, at any point, you’re just ranking 3 talk proposals against each other.
After completing our Comparison Evaluation, we had a rough draft Phase 1 ranking to start applying holistic considerations to.

What’s a holistic consideration?

Holistic considerations are when we look at the “whole” of the schedule, and the desire to have a balanced and diverse set of topics, teaching styles, and speakers.

For example, we knew we wanted speakers from around the globe and didn’t want to accidentally end up with speakers from just one corner of the world.

And we knew we didn’t want 50% of the talks to be about AI in Postgres. But clearly AI is a hot topic, which is why there will be a handful of super-interesting talks about the role of AI in the Postgres ecosystem.

And we knew we wanted at least a few Postgres talks that were Django-focused or Rails-focused.

We also knew there is an expectation that Postgres experts who work at Microsoft will share their expertise at this event, particularly as it relates to the Azure managed services for PostgreSQL.

However, we also wanted to make space for Postgres users and open source community members outside of Microsoft too. After all, Postgres is a global community that spans countries, companies, and timezones! The final schedule is not yet published, but I predict that over 65% (or even 70%) of the sessions will have speakers from outside of Microsoft.

During Phase 2 of POSETTE talk selection

First I needed to know how many slots we had to fill, so I reached out to the POSETTE organizing team to find out whether we could accept more than the original plan for 32 talks! So glad they said yes and we were able to accept 38 talks.
I categorized the talk proposals in a spreadsheet so we could filter and view all of the Phase 1 rankings in a few different ways, to look at the holistic considerations. Then I made a few adjustments to balance the set of accepted talks—and delivered the Talk Selection Team a proposed draft of Phase 2 rankings, with all 184 of the talk proposals in the various states of Accept / Reserve / Decline DUP¹ / Decline.
Phase 2 voting involved a long ~3.5 hour meeting to decide on specific changes we each wanted to make to the final roster of accepted talks/speakers. To make a change to the Phase 2 rankings, we had to make a case for the change. As a team, we would discuss the proposed change, ask questions, advocate, and then vote. I abstained from voting during this part unless I was needed as a tie-breaker (but I did not abstain from discussion.) We did this over and over, about 12 times, until landing on the final schedule.

Thanks for all your CFP submissions!

Thank you again to everyone who submitted proposals into the CFP for POSETTE.

And big welcome to those of you whose talks are accepted to POSETTE. And to those of you whose talk proposals are on the Reserve list, we’ll reach out to you right away if a spot opens up in POSETTE for your proposal.

When will the POSETTE Schedule be announced?

As for the schedule, it gets announced on May 1^st 2024. That’s when you’ll be able to see all the talks in each of the 4 livestreams for this year’s POSETTE: An Event for Postgres, happening virtually on Jun 11-13.

We hope you will be as excited about these speakers and these Postgres talks as we are.

In the meantime if you want to proactively mark your calendar for the livestreams which are most convenient for your schedule, here you go:

Add Livestream 1 to your calendar: Tue Jun 11, 2024 | 8:00am-1:30pm PDT (UTC-7)
Add Livestream 2 to your calendar: Wed Jun 12, 2024 | 8:00am-1:30pm CEST (UTC+2)
Add Livestream 3 to your calendar: Wed Jun 12, 2024 | 8:00am-1:30pm PDT (UTC-7)
Add Livestream 4 to your calendar: Thu Jun 13, 2024 | 8:00am-1:30pm CEST (UTC+2)

You can also subscribe to POSETTE news if you want email notifications as things happen, such as when the schedule is announced or when the videos are published, or when next year’s event is announced and the CFP opens for 2025.

And of course you can always follow @PosetteConf on Threads or on X/Twitter, or on Mastodon by following @posetteconf@mastodon.social. The event hashtag is #PosetteConf.

Shout-out to transparency from other Postgres program committees

Several other PostgreSQL events this year have also shared info about their CFP submissions and talk selection processes. A few recent examples:

PGConf.dev 2024: I loved this blog post by Paul Ramsey about Building the PGConf.dev Programme about how the PGConf.dev talk selection was done, to pick 33 talks out of over 180 submissions.
pgDay Paris 2024 Program committee feedback: This PDF from the pgDay Paris 2024 program committee also sheds light into the talk selection process used to pick 12 talks out of 116 CFP submissions. A daunting task I’m sure.
PGConf.EU 2023: During the “So long and thanks for all the fish” session at the very end of PGConf.EU 2023 in Prague, Dave Page and Magnus Hagander put a slide up on the big screen that showed the number of CFP submissions by day over the course of the CFP, from CFP open to CFP close—not just for the 2023 event, but for every PGConf.EU since 2009². As you might imagine, the number of talk proposals has grown over the years commensurate with the size of PGConf.EU: The 2023 conference had over 360 submissions. With just 51 talks accepted, that's a ~14% acceptance rate for PGConf.EU 2023.

Footnotes

Decline DUP is a state for a session proposed by a speaker who has another talk accepted already. As soon as that speaker’s talk is accepted, all the rest of their talk proposals move into the “Decline DUP” state, since we try not to accept more than 1 talk from any given speaker. ↩
I have a photo of the PGConfEU slide about the CFP that Dave & Magnus put on screen, but don’t want to publish it without permission of the PGConf.EU team, since it’s their decision to publish it for the whole world and not just the ~500+ people who were in the room! ↩

This article was originally published on citusdata.com.

PgBouncer Connection Pooler for Postgres Now Supports More Session Vars

Emel Simsek — 2024-04-04T17:50:00+00:00

PgBouncer is probably the most popular connection pooler for Postgres. It is essentially a transparant middleware between clients and the server. However, it is not %100 transparent in practice. There are a few intricacies that should be taken into account when using PgBouncer. One such consideration is that PgBouncer does not support the use of all session variables in transaction pooling mode. This lack of support is one of the reasons that the most commonly used transaction pooling mode is not fully compatible with Postgres. PgBouncer 1.20.0 started supporting two of the most requested session variables and laid ground work to be able to support all session variables in the future. Let’s break this down further.

The Impact of Pooling Mode on Postgres Compatibility

A connection pooler between the client and the server should ideally be completely transparent such that your application doesn’t have to be aware of the presence of a connection pooler. This is not the case with PgBouncer for all the pooling modes.

There are three different connection pooling modes:

Session
Transaction
Statement

The pooling mode determines how the connections in the pool are assigned to the clients. The way this is done may impose some limitations impacting Postgres compatibility.

In session pooling mode, there is a one-to-one mapping between the client and the pooled server connection throughout the lifetime of the client connection.

Hence session pooling mode is the most compatible mode with Postgres. The clients can use session variables which are Postgres parameters whose values can be set per session.

For instance, intervalstyle is a Postgres parameter with default value postgres. Now let’s say there are two clients connected to a Postgres server via a PgBouncer. In the session pooling mode, those clients can set intervalstyle to different values for their own session.

client1=> SET intervalstyle = postgres_verbose;
SET
client1=> SHOW intervalstyle;
  IntervalStyle
------------------
 postgres_verbose
(1 row)

client2=> SET intervalstyle = sql_standard;
SET
client2=> SHOW intervalstyle;
 IntervalStyle
---------------
 sql_standard
(1 row)

PgBouncer clients override each other’s session variables

Figure 1: This diagram depicts the problem before PgBouncer 1.20, in which Postgres clients would override each others session variables. The numbered circles show you the progression of what used to occur: in 1st and 2nd steps, client1 changes the session variable interval_style in the postgres backend. In 3rd and 4th steps, client2 which happens to be assigned the same backend due to transaction pooling mode, overrides interval_style. In 5th step, client1 sees the client2’s session variable unexpectedly.

In transaction pooling mode, there is no guarantee that a server is assigned to the same client over the lifetime of the client. A server connection in the pool may serve multiple clients interchangeably. An assignment of a server connection lasts for the duration of a transaction. Afterwards the client may be assigned a different server connection for subsequent transactions.

This way of operation imposes some limitations one of which is that the session variables cannot be used reliably.

When a client changes a session parameter, it changes the session state on the server connection it is using. If another concurrent client is assigned the same server connection later, it will inadvertently see the first client’s session parameters. The interval_style example above would not work in transaction pooling mode.

# client1 changes the intervalstyle parameter to be postgres_verbose
client1=> SET intervalstyle = postgres_verbose;
SET
client1=> SHOW intervalstyle;
  IntervalStyle
------------------
 postgres_verbose
(1 row)

# client2 changes the intervalstyle parameter to be sql_standard
client2=> SET intervalstyle = sql_standard;
SET
client2=> SHOW intervalstyle;
 IntervalStyle
---------------
 sql_standard
(1 row)

# client1 session variable intervalstyle is overriten unexpectedly
client1=> SHOW intervalstyle;
 IntervalStyle
---------------
 sql_standard
(1 row)

Note that how the intervalstyle value for client1 is overwritten by client2.

The clients override each other’s session state whenever they change a parameter. This was a clearly listed restriction of transaction pooling mode until our fix.

The New track_extra_parameters Configuration Option

PgBouncer 1.20.0 Release introduced a new configuration option called track_extra_parameters. This is a list of Postgres parameters that clients can change for their session even in the Transaction Pooling mode. PgBouncer should be configured with this option to enable the use of session variables.

For instance, to enable the use of session variables extra_float_digits and intervalstyle, the following line should be added to the .ini file:

track_extra_parameters = extra_float_digits, intervalstyle

Note that only the parameters whose changed values are reported back by Postgres server can be tracked. When a client runs a SET query, Postgres responds with a CommandComplete message. For the parameters marked with GUC_REPORT flag internally, Postgres appends a ParameterStatus message which includes the new parameter value as well. PgBouncer relies on this feature to cache the values of parameters per client connection.

For instance search_path is not marked with GUC_REPORT as of Postgres 16.0. So it is not possible to track search_path if you have a vanilla Postgres server. However if you use Citus 12.0+, you can use search_path as a session variable when connecting through PgBouncer.

As the track_extra_parameters name implies, PgBouncer previously supported a set of session variables {client_encoding, DateStyle, TimeZone, standard_conforming_strings, application_name}. But this static list was not enough for the desired Postgres compatibility. Support request for intervalstyle came up frequently. Also Citus 12.0 added a new schema-based sharding feature which required the use of search_path session variable for clients. Therefore, rather than expanding the static list of supported variables, a more flexible solution is provided such that the clients can explicitly opt-in to track their variables.

For more PgBouncer updates, follow:

https://www.pgbouncer.org/

This article was originally published on citusdata.com.