Citus Blog

Articles tagged: open source

The topic of this month's PGSQL Phriday #011 community blogging event is partitioning vs. sharding in PostgreSQL. It seemed right to share a perspective on the question of "partitioning vs. sharding" from someone in the Citus open source team, since we eat, sleep, and breathe sharding for Postgres.

Postgres built-in "native" partitioning—and sharding via PG extensions like Citus—are both tools to grow your Postgres database, scale your application, and improve your application's performance.

What is partitioning and what is sharding? In Postgres, database partitioning and sharding are techniques for splitting collections of data into smaller sets, so the database only needs to process smaller chunks of data at a time. And as you might imagine, work gets done faster when you're processing less data.

In this post, you'll learn what partitioning and sharding are, why they matter, and when to use them. The table of contents:

Keep reading
Onur Tirtir

Schema-based sharding comes to PostgreSQL with Citus

Written byBy Onur Tirtir | July 31, 2023Jul 31, 2023

Citus, a database scaling extension for PostgreSQL, is known for its ability to shard data tables and efficiently distribute workloads across multiple nodes. With Citus 12.0, Citus introduces a very exciting feature called schema-based sharding. The new schema-based sharding feature gives you a choice of how to distribute your data across a cluster, and for some data models (think: multi-tenant apps, microservices, etc.) this schema-based sharding approach may be significantly easier!

In this blog post, we will take a deep dive into the new schema-based sharding feature, and you will learn:

Keep reading

Postgres community released a new feature, in Postgres 15.0, that performs actions to modify rows in the target table, using the data from a source. MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows, a task that would otherwise require multiple procedural language statements, using INSERT with ON CONFLICT clause etc.

In this blog post, you will learn a high-level overview of the functioning of Postgres MERGE. It will delve into some of the practical use-cases, and subsequently elaborate on the different strategies employed by Citus for handling MERGE in a distributed environment.

Keep reading
Marco Slot

Citus 12: Schema-based sharding for PostgreSQL

Written byBy Marco Slot | July 18, 2023Jul 18, 2023

What if you could automatically shard your PostgreSQL database across any number of servers and get industry-leading performance at scale without any special data modelling steps?

Our latest Citus open source release, Citus 12, adds a new and easy way to transparently scale your Postgres database: Schema-based sharding, where the database is transparently sharded by schema name.

Schema-based sharding gives an easy path for scaling out several important classes of applications that can divide their data across schemas:

  • Multi-tenant SaaS applications
  • Microservices that use the same database
  • Vertical partitioning by groups of tables

Each of these scenarios can now be enabled on Citus using regular CREATE SCHEMA commands. That way, many existing applications and libraries (e.g. django-tenants) can scale out without any changes, and developing new applications can be much easier. Moreover, you keep all the other benefits of Citus, including distributed transactions, reference tables, rebalancing, and more.

Keep reading

Introducing Path To Citus Con, a podcast for developers who love Postgres. Why? Because sometimes, something you build gets bigger than you thought it would. The monthly podcast Path To Citus Con as originally meant to be a “pre-event” to build excitement and give a hands-on experience for people who would be attending Citus Con: An Event for Postgres. The audience would get a chance to talk to speakers for the conference and hear a deep dive conversation.

It’s now its own monthly podcast with guests from around the world. Guests have been deep in the world of databases and the Citus database extension to Postgres, and also people in the Postgres community and technology more generally. It’s the human side of open source, PostgreSQL, and the many PG extensions (including Citus.)

In this blog post, you’ll learn about what Path To Citus Con is, how you can participate, listen, and read each episode, and about episodes like “Working in public on open source,” “Why giving talks at Postgres conferences matters,” and more (details below.)

Keep reading

One of the most important improvements in Citus 11.3 is that Citus offers more reliable metadata sync. Before 11.3, when a Citus cluster had thousands of distributed objects (such as distributed tables), Citus occasionally experienced memory problems while running metadata sync. Due to these memory errors, some users with very large numbers of tables were sometimes unable to add new nodes or upgrade beyond Citus 11.0.

To address the memory issues, we added an alternative "non-transactional" mode to the current metadata sync in Citus 11.3.

The default mode for metadata sync is still the original single transaction mode that we introduced in Citus 11.0. But now in 11.3 or later, if you have a very large number of tables and you run into the memory error, you can choose to optionally switch to the non-transactional mode, which syncs the metadata via many transactions. While most of you who use Citus will not need to enable this alternative metadata sync mode, this is how to do it:

Keep reading

If you have ever used a database like Postgres, you know how important optimization is. Some minor changes in how the database is setup make all the difference between long waiting times and satisfied customers. And one crucial thing you need before doing the optimization is to monitor and understand how your database is being used.

Citus is an extension to Postgres that improves scalability and parallelization by distributing your Postgres database across nodes in a cluster. The Citus database extension is available as open source and as a managed service on the cloud, as Azure Cosmos DB for PostgreSQL. You can track your Citus nodes and the Postgres tables, but Citus 11.3 takes it one step further and introduces a new way to gather insight on your Citus database with tenant monitoring.

The new Citus 11.3 release, among many other features, introduces a new citus_stat_tenants view to track your most active tenants, for those with multi-tenant SaaS applications.

Keep reading

If you're building a software application that serves multiple tenants, you may have already encountered the challenges of managing and isolating tenant-specific data. That's where the django-multitenant library comes in. This library, actively used since 2017 and now downloaded more than 10K times per month, offers a simple and flexible solution for building multi-tenant Django applications.

In this blog post, we'll dive deeper into the concept of multi-tenancy and explore how Django-multitenant can help you build scalable, secure, and maintainable multi-tenant applications on top of PostgreSQL and the Citus database extension. We'll also provide a practical example of how to use Django-multitenant in a real-world scenario. So, if you're looking to simplify your multi-tenant development process, keep reading.

Keep reading

Citus enables several different PostgreSQL use cases, but one of the most popular ones is to build scalable multi-tenant software as a service (SaaS) applications. The most common way to build a multi-tenant application on Citus is to distribute all your Postgres tables by a “tenant ID” column. That way rows are (hash-)distributed across nodes, while rows with the same tenant ID value are co-located on the same node for fast local joins, transactions, and foreign keys.

For those of you who build SaaS apps, one question many of you have is how active your tenants are. More specifically: What are your busiest tenants? How many queries is your application doing on behalf of your tenants, and how much CPU do those queries use?

The new 11.3 release to the open source Citus database extension gives you tenant monitoring—with instant visibility into your top tenants using the new citus_stat_tenants feature, which shows query counts and CPU usage over a configurable time period.

Keep reading

Citus is a PostgreSQL extension that makes PostgreSQL scalable by transparently distributing and/or replicating tables across one or more PostgreSQL nodes. Citus could be used either on Azure cloud, or since the Citus database extension is fully open source, you can download and install Citus anywhere you like.

A typical Citus cluster consists of a special node called coordinator and a few worker nodes. Applications usually send their queries to the Citus coordinator node, which relays them to worker nodes and accumulates the results. (Unless of course you’re using the Citus query from any node feature, an optional feature introduced in Citus 11, in which case the queries can be routed to any of the nodes in the cluster.)

Anyway, one of the most frequently asked questions is: “How does Citus handle failures of the coordinator or worker nodes? What’s the HA story?”

And with the exception of when you’re running Citus in a managed service in the cloud, the answer so far was not great—just use PostgreSQL streaming to run coordinator and workers with HA and it is up to you how to handle a failover.

In this blog post, you’ll learn how Patroni 3.0+ can be used to deploy a highly available Citus database cluster—just by adding a few lines to the Patroni configuration file.

Keep reading

Page 3 of 7