Last week I wrote about locking behaviour in Postgres, which commands block each other, and how you can diagnose blocked commands. Of course, after the diagnosis you may also want a cure. With Postgres it is possible to shoot yourself in the foot, but Postgres also offers you a way to stay on target. These are some of the important do’s and don’ts that we’ve seen as helpful when working with users to migrate from their single node Postgres database to Citus or when building new real-time analytics apps on Citus.Continue reading
The Citus distributed database scales out PostgreSQL through sharding, replication, and query parallelization. For replication, our database as a service (by default) leverages the streaming replication logic built into Postgres.
When we talk to Citus users, we often hear questions about setting up Postgres high availability (HA) clusters and managing backups. How do you handle replication and machine failures? What challenges do you run into when setting up Postgres HA?
The PostgreSQL database follows a straightforward replication model. In this model, all writes go to a primary node. The primary node then locally applies those changes and propagates them to secondary nodes.
In the context of Postgres, the built-in replication (known as “streaming replication”) comes with several challenges:
- Postgres replication doesn’t come with built-in monitoring and failover. When the primary node fails, you need to promote a secondary to be the new primary. This promotion needs to happen in a way where clients write to only one primary node, and they don’t observe data inconsistencies.
- Many Postgres clients (written in different programming languages) talk to a single endpoint. When the primary node fails, these clients will keep retrying the same IP or DNS name. This makes failover visible to the application.
- Postgres replicates its entire state. When you need to construct a new secondary node, the secondary needs to replay the entire history of state change from the primary node. This process is resource intensive—and makes it expensive to kill nodes in the head and bring up new ones.
The first two challenges are well understood. Since the last challenge isn’t as widely recognized, we’ll examine it in this blog post.Continue reading
At Citus Data, we engineers take an active role in helping our customers scale out their Postgres database, be it for migrating an existing application or building a new application from scratch. This means we help you with distributing your relational data model—and also with getting the most out of Postgres.
One problem I often see users struggle with when it comes to Postgres is locks. While Postgres is amazing at running multiple operations at the same time, there are a few cases in which Postgres needs to block an operation using a lock. You therefore have to be careful about which locks your transactions take, but with the high-level abstractions that PostgreSQL provides, it can be difficult to know exactly what will happen. This post aims to demystify the locking behaviors in Postgres, and to give advice on how to avoid common problems.Continue reading
If you’re building a Java app, there’s a good chance you’re using Hibernate. The Hibernate ORM is a nearly ubiquitous choice for Java developers who need to interact with a relational database. It’s mature, widely supported, and feature rich—as demonstrated by its support for multi tenant applications.
Hibernate officially supports two different multi-tenancy mechanisms: separate database and separate schema. Unfortunately, both of these mechanisms come with some downsides in terms of scaling. A third Hibernate multi-tenancy mechanism, a tenant discriminator, also exists, and it’s usable—but it’s still considered a work-in-progress by some. Unlike the separate database and separate schema approaches, which require distinct database connections for each tenant, Hibernate’s tenant discriminator model stores tenant data in a single database and partitions records with either a simple column value or a complex SQL formula.
But fear not, despite the unfinished state of Hibernate’s built-in support for a tenant discriminator (or in simple terms
tenant_id), it’s possible to implement your own discriminator using standard Spring, Hibernate, and AspectJ mechanisms that work quite well. The Hibernate tenant discriminator model works well as you start small on a single-node Postgres database, and even better, tenant discriminator can continue to scale as your data grows by leveraging the Citus extension to Postgres.
Over the last two years, our engineering team at Citus Data has shortened release cycles from 12 months all the way down to 8 weeks. The most recent 7.2 release of the Citus database took 8 weeks exactly, start to finish.
These shortened release cycles have been chock full of new capabilities for our users, including distributed deadlock detection in Citus 7.0, multi-shard updates and deletes in Citus 7.1, and support for CTE’s (common table expressions) and complex Postgres subqueries in Citus 7.2.
On the Citus Cloud side (that’s our fully-managed database as a service that runs on AWS), we’ve recently added fork, followers, fully-online “warp” migration from existing PostgreSQL installations, and point-in-time-recovery (PITR), just to name a few.
When I step back to think about how we got here (as a co-founder of Citus Data, I’ve been here since the beginning), it’s no surprise that I attribute much of what we’ve accomplished to our team. But here’s the point about all these accomplishments that I think is so interesting: our engineering team is distributed across 5 countries and 6 different cities.Continue reading
In both Citus Cloud 2 and in the enterprise edition of Citus 7.1 there was a pretty big update to one of our flagship features—the shard rebalancer. No, I’m not talking about our shard rebalancer visualization that reminds me of the Windows ‘95 disk defrag. (Side-node: At one point I tried to persuade my engineering team to play tetris music in the background while the shard rebalancer UI in Citus Cloud was running. The good news for all of you is that I was overwhelmingly veto'ed by my team. Whew.) The interesting new capability in the Citus database is the online nature of our shard rebalancer.Continue reading
Today, we’re excited to announce our latest release of our distributed database—Citus 7.2. With this release, we’re making Citus more of a drop-in replacement for your single-node Postgres database, so you don’t need to adapt your SQL for a distributed system.
For multi-tenant applications where the single-tenant queries were scoped to a single machine, Citus already provided full SQL support. . The improvements in Citus 7.2 take our support for distributed SQL one big step further. With Citus database version 7.2, we now extend our distributed SQL support to queries that run on data spread across a cluster of machines. This becomes particularly important for real-time analytics workloads, where even the most complex SELECT queries need to be parallelized across machines.
If you’re into bulleted lists, here’s the quick overview of what’s new in Citus database version 7.2 for distributed queries that span across machines. For an overview of other recent Citus features check out these blogs about distributed transactions and Citus 7.1.Continue reading
Years ago Citus used to have multiple methods for distributing data across many nodes (we actually still support both today), there was both hash-based partitioning and time-based partitioning. Over time we found big benefits in further enhancing the features around hash-based partitioning which enabled us to add richer SQL support, transactions, foreign keys, and more. Thus in recent years, we put less energy into time-based partitioning. But… no one stopped asking us about time partitioning, especially for fast data expiration. All that time we were listening. We just thought it best to align our product with the path of core Postgres as opposed to branching away from it.
Postgres has had some form of time-based partitioning for years. Though for many years it was a bit kludgy and wasn’t part of core Postgres. With Postgres 10 came native time partitioning, and because Citus is an extension to Postgres that means anyone using Citus gets to take advantage of time-based partitioning as well. You can now create tables that are distributed across nodes by ID and partitioned by time on disk.
We have found a few postgres extensions that make partitioning much easier to use. The best in class for improving time partitioning is pg_partman and today we’ll dig into getting time partitioning set up with your Citus database cluster using pg_partman.Continue reading
Today we’re excited to announce that you can now use our fully-managed database as a service, Citus Cloud, to manage protected health information (PHI) and to build HIPAA-compliant applications on top of Postgres. For those of you building apps in healthcare environments regulated by the Health Insurance Portability and Accountability Act (HIPAA, you can feel safer knowing you now have a scalable Postgres database that meets your healthcare compliance requirements. .
If you’re building an application on top of Postgres and you need a combination of horizontal scale as well as HIPAA compliance, reach out to us if you want more information about getting a Business Associate Agreement (BAA) with Citus Data in place.Continue reading
When it comes to building large-scale, multi-tenant applications, Microsoft’s ASP.NET platform is a strong choice. Like other popular web frameworks such as Express and Django, ASP.NET is used to build web applications and APIs. It’s been around for a while, but don’t let that fool you: ASP.NET packs some serious muscle. After all, it powers one of the biggest Q&A networks on the web: Stack Exchange!
In the past, ASP.NET apps could only run on Windows servers. That’s changed with the latest version, ASP.NET Core, which is fully open source and cross-platform. ASP.NET Core runs anywhere you need it to (Windows, Mac, Linux, Docker) and features a modern middleware pipeline, a rich package ecosystem, and blazing-fast performance.
My experience working on multi-tenant enterprise apps has taught me that it’s never too early to design for scale. How you architect your code matters, as does how you architect your data. In the past, the apps I worked on were designed around a database-per-tenant model—unfortunately, the database-per-tenant model didn’t scale and caused problems once our app reached thousands of customers (aka tenants). In this post, I’ll show you a different approach to scale the underlying database with ASP.NET: sharding. With sharding you can leave behind the drawbacks of the database-per-tenant model and can scale infinitely.
In this blog post, I’ll show you how to build your multi-tenant app with scale in mind. You’ll learn how to use ASP.NET Core’s middleware pipeline plus the sharding features of Postgres and Citus to build a scalable multi-tenant application on ASP.NET Core. Along the way we’ll start to build the MVP of our very own StackExchange. Let’s get started!Continue reading