Join over 12K people who already subscribe to our monthly Citus technical newsletter. Our goal is to be useful, informative, and not-boring.

Citus Data Blog

Thoughts on scaling out PostgreSQL, sharding, multi-tenant apps, real-time analytics, and distributed databases.

Ozgun Erdogan
By Ozgun Erdogan
March 3, 2020

Microsoft Azure Welcomes PostgreSQL Committers

Interview with the Postgres committers who have joined the Postgres team at Microsoft by Sudhakar Sannakkayala (Partner Director, Azure Data) and Ozgun Erdogan (Principal, Azure Data)—cross-posted from the Azure Postgres blog.

In recent years, the data landscape has seen strong innovation as a result of the onset of open source technologies. At the forefront, PostgreSQL has shown that it’s the open source database built for every type of developer. By staying true to its principles of being standards-compliant, highly programmable, and extensible, PostgreSQL has solidified its position as the “most loved database” of developers across the board—ranging from scenarios for OLTP, analytics, and business intelligence to processing various formats of geometric data using the PostGIS extension.

Continue reading
Marco Slot
By Marco Slot
March 2, 2020

Citus 9.2 speeds up large scale HTAP workloads on Postgres

Some of you have been asking, “what’s happening with the Citus open source extension to Postgres?” The short answer is: a lot. More and more users have adopted the Citus extension in order to scale out Postgres, to increase performance and enable growth. And you’re probably not surprised to learn that since Microsoft acquired Citus Data last year, our engineering team has grown quite a bit—and we’ve been continuing to evolve and innovate on the Citus open source extension.

Our newest release is Citus 9.2. We’ve updated the installation instructions on our Download page and in our Citus documentation, and now it’s time to take a walk through what’s new.

Continue reading
Claire Giordano
By Claire Giordano
January 16, 2020

Tips on how to get your conference talk SELECTED

As I get ready for the PgDay San Francisco event that is happening next Tue 21 January—a one-day, single-track Postgres community event at the awesome Swedish American Hall in SF—I’m reflecting a bit on how important the speakers are to developer events. Let’s face it, without speakers, there would be no conference.

And because I was on the PgDaySF talk selection committee, I’ve had some good conversations these last few months about CFPs, conference talks, how talk selection committees work, and how you can improve your chances at getting your proposals accepted. So I thought it would be useful to walk through the tips I’ve accumulated on how to get your conference talk accepted—at a Postgres conference, or at any developer conference.

These tips are premised on the notion that a good conference talk requires these 4 things:

  1. interestingness: a topic people will care about—and learn from
  2. knowledgeable speaker who knows their subject & can communicate effectively with an audience—so people can follow, understand, and learn
  3. a hook: a compelling title & abstract that will hook people and entice them to attend
  4. fits holistically into the rest of the lineup: a talk that complements the rest of the talks at the event, that adds something unique, and doesn’t overlap the other talks in a significant way
Continue reading
Claire Giordano
By Claire Giordano
December 7, 2019

Architecting petabyte-scale analytics by scaling out Postgres on Azure with Citus

How do you know if the next update to your software is ready for hundreds of millions of customers? It starts with data. And when it comes to Windows, we’re talking lots of data. The Windows team measures the quality of new software builds by scrutinizing 20,000 diagnostic metrics based on data flowing in from 800 million Windows devices. At the same time, the team evaluates feedback from Microsoft engineers who are using pre-release versions of Windows updates.

At Microsoft, the Windows diagnostic metrics are displayed on a real-time analytics dashboard called “Release Quality View” (RQV), which helps the internal “ship-room” team assess the quality of the customer experience before each new Windows update is released. Given the importance of Windows for Microsoft’s customers, the RQV analytics dashboard is a critical tool for Windows engineers, program managers, and execs.

Continue reading
Claire Giordano
By Claire Giordano
September 26, 2019

What DjangoCon has to do with Postgres and Crocodiles. An interview with Louise Grandjonc from Microsoft

When Django developer and Azure Postgres* engineer Louise Grandjonc confirmed that she could sit down with me for an interview in the days leading up to DjangoCon 2019, I jumped at the chance. Those of you who were in the room for Louise’s talk this week probably understand why. Louise explains technical topics in a way that makes sense—and she often uses unusual (and fun) examples, from crocodiles to owls, from Harry Potter to Taylor Swift.

And since I experience a bit of FOMO whenever I miss a fun developer conference like DjangoCon, I especially wanted to learn more about Louise’s DjangoCon talk: Postgres Index Types and where to find them.

Here’s an edited transcript of my interview with Louise Grandjonc of Microsoft (@louisemeta on Twitter.)

Continue reading
Craig Kerstiens
By Craig Kerstiens
July 17, 2019

Postgres tips for the average and power user

Personally I’m a big fan of email, just like blogging. To me a good email thread can be like a good novel where you’re following along always curious for what comes next. And no, I don’t mean the ones where there is an email to [email protected] and someone replies all, to only receive reply-all’s to not reply-all. I mean ones like started last week internally among the Azure Postgres team.

The first email was titled: Random Citus development and psql tips, and from there it piled on to be more and more tips and power user suggestions for Postgres. Some of these tips are relevant if you’re working directly on the Citus codebase, others relevant as anyone that works with Postgres, and some useful for debugging Postgres internals. While the thread is still ongoing here is just a few of the great tips:

Continue reading
Louise Grandjonc
By Louise Grandjonc
July 5, 2019

Testing your Django app with Citus

Recently, I started working on the django-multitenant application. The main reason we created it was to to help django developers use citus in their app. While I was working on it, I wrote unit tests. And to be able to reproduce a customer’s production environment, I wanted the tests to use citus and not a single node postgres. If you are using citus as your production database, we encourage you to have it running in your development environment as well as your staging environments to be able to minimise the gap between dev and production. To understand better the importance of dev/prod parity, I recommend reading the Twelve-Factor app that will give you ideas to lower the chances of having last minute surprising when deploying on prod.

Continue reading
Dimitri Fontaine
By Dimitri Fontaine
May 30, 2019

Introducing pg_auto_failover: A high availability and automated failover Postgres extension

As part of the Citus team (Citus scales out Postgres horizontally, but that’s not all we work on), I’ve been working on pg_auto_failover for quite some time now and I’m excited that we have now introduced pgautofailover as Open Source, to give you automated failover and high availability!

When designing pg_auto_failover, our goal was this: to provide an easy to set up Business Continuity solution for Postgres that implements fault tolerance of any one node in the system. The documentation chapter about the pg_auto_failover architecture includes the following:

It is important to understand that pgautofailover is optimized for Business Continuity. In the event of losing a single node, then pgautofailover is capable of continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL Synchronous Replication.

Introduction to pg_auto_failover

The pg_auto_failover solution for Postgres is meant to provide an easy to setup and reliable automated failover solution. This solution includes software driven decision making for when to implement failover in production.

Continue reading
Lukas Fittl
By Lukas Fittl
May 23, 2019

Managing multiple databases in Rails 6

If you’ve worked with Ruby on Rails you likely have some understanding of how your database works with Rails, traditionally that has always meant specifying a single database per environment in your config/database.yml, possibly together with an environment setting like DATABASE_URL. Based on that configuration all reads and writes will access the database.

With Rails 6 this is about to change, thanks to the work of Eileen M. Uchitelle together with contributors from GitHub, Basecamp and Shopify. In the upcoming Rails 6 (currently in RC1), you will be able to easily change which database server you are connecting to, to support a variety of scenarios such as using read replicas and splitting your database into dedicated components.

The most interesting part, which we wanted to detail in this post, is related to configuring automatic queries against a read replicas, or follower database.

Continue reading
Craig Kerstiens
By Craig Kerstiens
May 6, 2019

Introducing Hyperscale (Citus) on Azure Database for PostgreSQL

For roughly ten years now, I’ve had the pleasure of running and managing databases for people. In the early stages of building an application you move quickly, adding new tables and columns to your Postgres database to support new functionality. You move quickly, but you don’t worry too much because things are fast and responsive–largely because your data is small. Over time your application grows and matures. Your data model stabilizes, and you start to spend more time tuning and tweaking to ensure performance and stability stay where they need to. Eventually you get to the point where you miss the days of maintaining a small database, because life was easier then. Indexes were created quickly, joins were fast, count(*) didn’t bring your database to a screeching halt, and vacuum was not a regular part of your lunchtime conversation. As you continue to tweak and optimize the system, you know you need a plan for the future and know how you’re going to continue to scale.

Now in Preview: Introducing Hyperscale (Citus) on Azure Database for PostgreSQL

With Hyperscale (Citus) on Azure Database for PostgreSQL, we help many of those worries fade away. I am super excited to announce that Citus is now available on Microsoft Azure, as a new deployment option on the Azure Database for PostgreSQL called Hyperscale (Citus).

Hyperscale (Citus) scales out your data across multiple physical nodes, with the underlying data being sharded into much smaller bits. The same database sharding principles that work for Facebook and Google are baked right into the database. But, unlike traditional sharded systems, your application doesn’t have to learn how to shard the data. With Azure Database on PostgreSQL, Hyperscale (Citus) takes Postgres, the open source relational database, and extends it with low level internal hooks.

Continue reading

Page 1 of 23