Citus BlogCitus Blog

Thoughts on scaling out PostgreSQL, sharding, multi-tenant apps, real-time analytics, and distributed databases.

Samay Sharma

Debugging Postgres autovacuum problems: 13 tips

Written byBy Samay Sharma | July 28, 2022Jul 28, 2022

If you’ve been running PostgreSQL for a while, you’ve heard about autovacuum. Yes, autovacuum, the thing which everybody asks you not to turn off, which is supposed to keep your database clean and reduce bloat automatically.

And yet—imagine this: one fine day, you see that your database size is larger than you expect, the I/O load on your database has increased, and things have slowed down without much change in workload. You begin looking into what might have happened. You run the excellent Postgres bloat query and you notice you have a lot of bloat. So you run the VACUUM command manually to clear the bloat in your Postgres database. Good!

But then you have to address the elephant in the room: why didn’t Postgres autovacuum clean up the bloat in the first place…? Does the above story sound familiar? Well, you are not alone. 😊

Keep reading

We released Citus 11 in the previous weeks and it is packed. Citus went full open source, so now previously enterprise features like the non-blocking aspect of the shard rebalancer—and multi-user support—are all open source for everyone to enjoy. One other huge change in Citus 11 is now you can query your distributed Postgres tables from any Citus node, by default.

When using Citus to distribute Postgres before Citus 11, the coordinator node was your application’s only point of contact. Your application needed to connect to the coordinator to query your distributed Postgres tables. Coordinator node can handle high query throughput, about 100K per second but your application might need even more processing power. Thanks to our work in Citus 11 you can now query from any node in the Citus database cluster you want. In Citus 11 we sync the metadata to all nodes by default, so you can connect to any node and run queries on your tables.

Running queries from any node is awesome but you also need to be able to monitor and manage your queries from any node. Before, when you only connected the coordinator, using Postgres’ monitoring tools was enough but this is not the case anymore. So in Citus 11 we added some ways to observe your queries similar to you would do in a single Postgres instance.

Keep reading

Citus 11.0 is here! Citus is a PostgreSQL extension that adds distributed database superpowers to PostgreSQL. With Citus, you can create tables that are transparently distributed or replicated across a cluster of PostgreSQL nodes. Citus 11.0 is a new major release, which means that it comes with some very exciting new features that enable new levels of scalability.

The biggest enhancement in Citus 11.0 is that you can now always run distributed queries from any node in the cluster because the schema & metadata are automatically synchronized. We already shared some of the details in the Citus 11.0 beta blog post, but we also have big surprise for those of you who use Citus open source that was not part of the initial beta.

When we do a new Citus release, we usually release 2 versions: The open source version and the enterprise release which includes a few extra features. However, there will be only one version of Citus 11.0, because everything in the Citus extension is now fully open source!

That means that you can now rebalance shards without blocking writes, manage roles across the cluster, isolate tenants to their own shards, and more. All this comes on top of the already massive enhancement in Citus 11.0: You can query your Citus cluster from any node, creating a truly distributed PostgreSQL experience.

Keep reading
David Rowley

Speeding up sort performance in Postgres 15

Written byBy David Rowley | May 19, 2022May 19, 2022

In recent years, PostgreSQL has seen several improvements which make sorting faster. In the PostgreSQL 15 development cycle—which ended in April 2022—Ronan Dunklau, Thomas Munro, Heikki Linnakangas, and I contributed some changes to PostgreSQL to make sorts go even faster.

Each of the improvements to sort should be available when PostgreSQL 15 is out in late 2022.

Why care about sort performance? When you run your application on PostgreSQL, there are several scenarios where PostgreSQL needs to sort records (aka rows) on your behalf. The main one is for ORDER BY queries. Sorting can also be used in:

  • Aggregate functions with an ORDER BY clause
  • GROUP BY queries
  • Queries with a plan containing a Merge Join
  • UNION queries
  • DISTINCT queries
  • Queries with window functions with a PARTITION BY and/or ORDER BY clause

If PostgreSQL is able to sort records faster, then queries using sort will run more quickly.

Keep reading
Claire Giordano

Ultimate Guide to Citus Con: An Event for Postgres

Written byBy Claire Giordano | March 29, 2022Mar 29, 2022

One of the good things with a virtual event like Citus Con is that you have a lot of flexibility about where and when to watch the talks. From your home office, or a café, or the beach—or even the car, while you wait to pick up your kids. As long as you have an internet connection, you’re in.

But you still need to figure out which talks and livestreams you want to watch when the event goes live on Tuesday, April 12. To help you out, we’ve created this guide to Citus Con: An Event for Postgres. And just for kicks I’m calling it the “Ultimate Guide” to CitusCon. (Ha! Since this is a first time event maybe it will be the only guide to Citus Con. Therefore definitely “ultimate”.)

In working on this event—I’m a co-chair along with Teresa Giacomini, also head of the talk selection team—I realized I had “tagged and categorized” each and every talk both in my head and on a spreadsheet. So that’s what this blog post will give you… a framework for knowing which talks are in which categories.

Of course, if you want to see the abstracts for all the talks, just pop over to the Schedule & Sessions page for Citus Con.

Keep reading
Marco Slot

Test drive the Citus 11.0 beta for Postgres

Written byBy Marco Slot | March 26, 2022Mar 26, 2022

Today we released Citus 11.0 beta, which is our first ever beta release of the Citus open source extension to Postgres. The reason we are releasing a beta version of 11.0 is that we are introducing a few fundamentally new capabilities, and we would like to get feedback from those of you who use Citus before we release Citus 11.0 to the world.

The biggest change in Citus 11.0 beta is that the schema and Citus metadata are now automatically synchronized throughout the database cluster. That means you can always query distributed tables from any node in a Citus cluster!

The easiest way to use Citus is to connect to the coordinator node and use it for both schema changes and distributed queries, but for very demanding applications, you now have the option to load balance distributed queries across the worker nodes in (parts of) your application by using a different connection string and factoring a few limitations.

Keep reading

My main advice when running performance benchmarks for Postgres is: “Automate it!”

If you’re measuring database performance, you are likely going to have to run the same benchmark over and over again. Either because you want a slightly different configuration, or because you realized you used some wrong settings, or maybe some other reason. By automating the way you’re running performance benchmarks, you won’t be too annoyed when this happens, because re-running the benchmarks will cost very little effort (it will only cost some time).

However, building this automation for the database benchmarks can be very time-consuming, too. So, in this post I’ll share the tools I built to make it easy to run benchmarks against Postgres—specifically against the Citus extension to Postgres running in a managed database service on Azure called Hyperscale (Citus) in Azure Database for PostgreSQL.

Here’s your map for reading this post: each anchor link takes you to a different section. The first sections explore the different types of application workloads and their characteristics, plus the off-the-shelf benchmarks that are commonly used for each. After that you can dive into the “how to” aspects of using HammerDB with Citus and Postgres on Azure. And yes, you’ll see some sample benchmarking results, too.

Keep reading
Claire Giordano

Call for speakers for Citus Con: An Event for Postgres

Written byBy Claire Giordano | January 31, 2022Jan 31, 2022

When you find yourself answering the same questions again and again, it’s a good idea to blog about it. Which is why this post about Citus Con: An Event for Postgres exists: to answer your questions, and share the news about this first-ever, inaugural event.

Citus Con: An Event for Postgres is a free and virtual developer event happening in April 2022, organized by the Postgres and Citus team here at Microsoft. Speakers will come from different parts of the Postgres ecosystem, including Postgres users, Citus open source users, Azure Database for PostgreSQL customers, and developers/experts in PostgreSQL and Postgres extensions, like Citus.

The Call for Proposals (CFP) for Citus Con is open until Feb 6th. Whether this will be your 1000th conference talk or your very 1st, we’d love to see what Postgres experiences you have to share.

Keep reading
Gurkan Indibay

Tips for installing Citus and Postgres packages

Written byBy Gürkan İndibay | January 22, 2022Jan 22, 2022

Citus is a great extension for scaling out Postgres databases horizontally. You can use Citus either on the cloud on Azure or you can download Citus open source and install it wherever. In this blog post, we will focus on Citus open source packaging and installation.

When you go to the Citus download page to download the Citus packages—or you visit the Citus open source docs—many of you jump straight to the install instructions and the particular OS you’re looking for. That way, you can get straight to sharding Postgres with Citus.

But what if you want to see which operating systems the Citus packages support? Or what if you want to install Citus with an older version of Postgres?

This post will answer these types of nitty-gritty questions about Citus packages and their usages. Specifically, this post will cover these questions:

Keep reading

If you’ve never done it before, you might be daunted by the idea of giving a conference talk. You know: the work involved, the butterflies, how to make it a good talk and not a boring one, the people who might judge you… And perhaps the hardest bit: choosing a topic others will find interesting.

The last few months I’ve been working on a new event. It’s a free and virtual developer event happening on 12-13 Apr 2022 called Citus Con: An Event for Postgres. Organized by the Postgres and Citus team here at Microsoft Azure, Citus Con is geared toward Postgres users, Azure Database for PostgreSQL customers, and those who use the Citus extension to Postgres (or other PG extensions.)

Keep reading

Page 1 of 28