Citus Blog

Articles tagged: Postgres

Postgres community released a new feature, in Postgres 15.0, that performs actions to modify rows in the target table, using the data from a source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using INSERT with ON CONFLICT clause etc.

In this blog post, you will learn a high-level overview of the functioning of Postgres MERGE. It will delve into some of the practical use-cases, and subsequently elaborate on the different strategies employed by Citus for handling MERGE in a distributed environment.

Keep reading
Marco Slot

Citus 12: Schema-based sharding for PostgreSQL

Written byBy Marco Slot | July 18, 2023Jul 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps?

Our latest Citus open source release, Citus 12, adds a new and easy way to transparently scale your Postgres database: Schema-based sharding, where the database is transparently sharded by schema name.

Schema-based sharding gives an easy path for scaling out several important classes of applications that can divide their data across schemas:

  • Multi-tenant SaaS applications
  • Microservices that use the same database
  • Vertical partitioning by groups of tables

Each of these scenarios can now be enabled on Citus using regular CREATE SCHEMA commands. That way, many existing applications and libraries (e.g. django-tenants) can scale out without any changes, and developing new applications can be much easier. Moreover, you keep all the other benefits of Citus, including distributed transactions, reference tables, rebalancing, and more.

Keep reading

Introducing Path To Citus Con, a podcast for developers who love Postgres. Why? Because sometimes, something you build gets bigger than you thought it would. The monthly podcast Path To Citus Con as originally meant to be a “pre-event” to build excitement and give a hands-on experience for people who would be attending Citus Con: An Event for Postgres. The audience would get a chance to talk to speakers for the conference and hear a deep dive conversation.

It’s now its own monthly podcast with guests from around the world. Guests have been deep in the world of databases and the Citus database extension to Postgres, and also people in the Postgres community and technology more generally. It’s the human side of open source, PostgreSQL, and the many PG extensions (including Citus.)

In this blog post, you’ll learn about what Path To Citus Con is, how you can participate, listen, and read each episode, and about episodes like “Working in public on open source,” “Why giving talks at Postgres conferences matters,” and more (details below.)

Keep reading

Distributed PostgreSQL has become a hot topic. Several distributed database vendors have added support for the PostgreSQL protocol as a convenient way to gain access to the PostgreSQL ecosystem. Others (like us) have built a distributed database on top of PostgreSQL itself.

For the Citus database team, distributed PostgreSQL is primarily about achieving high performance at scale. The unique thing about Citus, the technology powering Azure Cosmos DB for PostgreSQL, is that it is fully implemented as an open-source extension to PostgreSQL. It also leans entirely on PostgreSQL for storage, indexing, low-level query planning and execution, and various performance features. As such, Citus inherits the performance characteristics of a single PostgreSQL server but applies them at scale.

That all sounds good in theory, but to see whether this holds up in practice, you need benchmark numbers. We therefore asked GigaOM to run performance benchmarks comparing Azure Cosmos DB for PostgreSQL to other distributed implementations. GigaOM compared the transaction performance and price-performance of these popular managed services of distributed PostgreSQL, using the HammerDB benchmark software:

Keep reading

One of the most important improvements in Citus 11.3 is that Citus offers more reliable metadata sync. Before 11.3, when a Citus cluster had thousands of distributed objects (such as distributed tables), Citus occasionally experienced memory problems while running metadata sync. Due to these memory errors, some users with very large numbers of tables were sometimes unable to add new nodes or upgrade beyond Citus 11.0.

To address the memory issues, we added an alternative "non-transactional" mode to the current metadata sync in Citus 11.3.

The default mode for metadata sync is still the original single transaction mode that we introduced in Citus 11.0. But now in 11.3 or later, if you have a very large number of tables and you run into the memory error, you can choose to optionally switch to the non-transactional mode, which syncs the metadata via many transactions. While most of you who use Citus will not need to enable this alternative metadata sync mode, this is how to do it:

Keep reading

If you're building a software application that serves multiple tenants, you may have already encountered the challenges of managing and isolating tenant-specific data. That's where the django-multitenant library comes in. This library, actively used since 2017 and now downloaded more than 10K times per month, offers a simple and flexible solution for building multi-tenant Django applications.

In this blog post, we'll dive deeper into the concept of multi-tenancy and explore how Django-multitenant can help you build scalable, secure, and maintainable multi-tenant applications on top of PostgreSQL and the Citus database extension. We'll also provide a practical example of how to use Django-multitenant in a real-world scenario. So, if you're looking to simplify your multi-tenant development process, keep reading.

Keep reading

A developer friend of mine prefers to read about what to expect at upcoming events in the narrative form of a blog, rather than having to click in and out of different abstracts on a schedule page.

So this ultimate guide post is my gift to those of you who want to know more about the 37 talks that will be presented at this year’s 2nd annual Citus Con: An Event for Postgres 2023—and who want to read about it in blog post form.

And yes, Citus Con is virtual again this year! This means you can watch all the livestream & on-demand talks from the comfort of your very own desk—and chit-chat in the virtual hallway track on the #cituscon channel on Discord.

[Update in May 2023]: It's a wrap! The categories in this ultimate guide will help you find the talks which are most useful to you and your work/interests. Or you can jump straight to the playlist of all 37 Citus Con 2023 talks on YouTube.

So what’s on the schedule at Citus Con: An Event for Postgres 2023, exactly? Be sure to check out both tabs on the Schedule page, both the Live Sessions & the On-Demand Sessions tabs, to learn about the:

Keep reading

Citus is a PostgreSQL extension that makes PostgreSQL scalable by transparently distributing and/or replicating tables across one or more PostgreSQL nodes. Citus could be used either on Azure cloud, or since the Citus database extension is fully open source, you can download and install Citus anywhere you like.

A typical Citus cluster consists of a special node called coordinator and a few worker nodes. Applications usually send their queries to the Citus coordinator node, which relays them to worker nodes and accumulates the results. (Unless of course you’re using the Citus query from any node feature, an optional feature introduced in Citus 11, in which case the queries can be routed to any of the nodes in the cluster.)

Anyway, one of the most frequently asked questions is: “How does Citus handle failures of the coordinator or worker nodes? What’s the HA story?”

And with the exception of when you’re running Citus in a managed service in the cloud, the answer so far was not great—just use PostgreSQL streaming to run coordinator and workers with HA and it is up to you how to handle a failover.

In this blog post, you’ll learn how Patroni 3.0+ can be used to deploy a highly available Citus database cluster—just by adding a few lines to the Patroni configuration file.

Keep reading

Our goal for the Citus extension is for you to be able to use all PostgreSQL features at any scale, with a seamless scaling experience. Distributed tables (or more generally “Citus tables”) are a powerful tool to get high performance at any scale. There are only a few remaining limitations when distributing a PostgreSQL table, but we are determined to solve them all. The Citus 11.2 release checks off another five SQL & DDL features that now work seamlessly on Citus tables. We also improved progress tracking for the shard rebalancer, so you know exactly what’s going on when rebalancing your cluster.

We also want PostgreSQL tools to work out-of-the-box even if you have a distributed PostgreSQL cluster. One of the most frequent questions we get on the Citus Slack from our open source users is how to set up high availability. Alexander Kukushkin, who is the primary maintainer of Patroni and recently joined the Citus database engine team, therefore developed a new version of Patroni which includes support for Citus!

Before we dive in, you can find detailed release notes for Citus 11.2 by the engineering team on our Updates page.

Keep reading
Nazir Bilal Yavuz

Debugging PostgreSQL CI failures faster: 4 tips

Written byBy Nazir Bilal Yavuz | January 18, 2023Jan 18, 2023

Postgres is one of the most widely used databases and supports a number of operating systems. When you are writing code for PostgreSQL, it's easy to test your changes locally, but it can be cumbersome to test it on all operating systems. A lot of times, you may encounter failures across platforms and it can get confusing to move forward while debugging. To make the dev/test process easier for you, you can use the Postgres CI.

When you test your changes on CI and see it fail, how do you proceed to debug from there? As a part of our work in the open source Postgres team at Microsoft, we often run into CI failures—and more often than not, the bug is not obvious, and requires further digging into.

In this blog post, you'll learn about techniques you can use to debug PostgreSQL CI failures faster. We'll be discussing these 4 tips in detail:

Keep reading

Page 3 of 15