The Citus Blog | Citus Data

Architecting petabyte-scale analytics by scaling out Postgres on Azure with Citus

Written byBy Claire Giordano | December 7, 2019Dec 7, 2019

How do you know if the next update to your software is ready for hundreds of millions of customers? It starts with data. And when it comes to Windows, we’re talking lots of data. The Windows team measures the quality of new software builds by scrutinizing 20,000 diagnostic metrics based on data flowing in from 1.2 billion Windows devices. At the same time, the team evaluates feedback from Microsoft engineers who are using pre-release versions of Windows updates.

At Microsoft, the Windows diagnostic metrics are displayed on a real-time analytics dashboard called “Release Quality View” (RQV), which helps the internal “ship-room” team assess the quality of the customer experience before each new Windows update is released. Given the importance of Windows for Microsoft’s customers, the RQV analytics dashboard is a critical tool for Windows engineers, program managers, and execs.

Keep reading

What DjangoCon has to do with Postgres and Crocodiles. An interview with Louise Grandjonc from Microsoft

Written byBy Claire Giordano | September 26, 2019Sep 26, 2019

When Django developer and Azure Postgres* engineer Louise Grandjonc confirmed that she could sit down with me for an interview in the days leading up to DjangoCon 2019, I jumped at the chance. Those of you who were in the room for Louise’s talk this week probably understand why. Louise explains technical topics in a way that makes sense—and she often uses unusual (and fun) examples, from crocodiles to owls, from Harry Potter to Taylor Swift.

And since I experience a bit of FOMO whenever I miss a fun developer conference like DjangoCon, I especially wanted to learn more about Louise’s DjangoCon talk: Postgres Index Types and where to find them.

Here’s an edited transcript of my interview with Louise Grandjonc of Microsoft (@louisemeta on Twitter.)

Keep reading

Postgres tips for the average and power user

Written byBy Craig Kerstiens | July 17, 2019Jul 17, 2019

Personally I'm a big fan of email, just like blogging. To me a good email thread can be like a good novel where you're following along always curious for what comes next. And no, I don't mean the ones where there is an email to all-employees@company.com and someone replies all, to only receive reply-all's to not reply-all. I mean ones like started last week internally among the Azure Postgres team.

The first email was titled: Random Citus development and psql tips, and from there it piled on to be more and more tips and power user suggestions for Postgres. Some of these tips are relevant if you're working directly on the Citus open source code, others relevant as anyone that works with Postgres, and some useful for debugging Postgres internals. While the thread is still ongoing here is just a few of the great tips:

Keep reading

Testing your Django app with Citus

Written byBy Louise Grandjonc | July 5, 2019Jul 5, 2019

Recently, I started working on the django-multitenant application. The main reason we created it was to to help django developers use citus in their app. While I was working on it, I wrote unit tests. And to be able to reproduce a customer’s production environment, I wanted the tests to use citus and not a single node postgres. If you are using citus as your production database, we encourage you to have it running in your development environment as well as your staging environments to be able to minimise the gap between dev and production. To understand better the importance of dev/prod parity, I recommend reading the Twelve-Factor app that will give you ideas to lower the chances of having last minute surprising when deploying on prod.

Keep reading

Introducing pg_auto_failover: A high availability and automated failover Postgres extension

Written byBy Dimitri Fontaine | May 30, 2019May 30, 2019

As part of the Citus team (Citus scales out Postgres horizontally, but that’s not all we work on), I've been working on pg_auto_failover for quite some time now and I'm excited that we have now introduced pg_auto_failover as Open Source, to give you automated failover and high availability!

When designing pg_auto_failover, our goal was this: to provide an easy to set up Business Continuity solution for Postgres that implements fault tolerance of any one node in the system. The documentation chapter about the pg_auto_failover architecture includes the following:

It is important to understand that pg_auto_failover is optimized for Business Continuity. In the event of losing a single node, then pg_auto_failover is capable of continuing the PostgreSQL service, and prevents any data loss when doing so, thanks to PostgreSQL Synchronous Replication.

Introduction to pg_auto_failover

The pg_auto_failover solution for Postgres is meant to provide an easy to setup and reliable automated failover solution. This solution includes software driven decision making for when to implement failover in production.

Keep reading

Managing multiple databases in Rails 6

Written byBy Lukas Fittl | May 23, 2019May 23, 2019

If you’ve worked with Ruby on Rails you likely have some understanding of how your database works with Rails, traditionally that has always meant specifying a single database per environment in your config/database.yml, possibly together with an environment setting like DATABASE_URL. Based on that configuration all reads and writes will access the database.

With Rails 6 this is about to change, thanks to the work of Eileen M. Uchitelle together with contributors from GitHub, Basecamp and Shopify. In the upcoming Rails 6 (currently in RC1), you will be able to easily change which database server you are connecting to, to support a variety of scenarios such as using read replicas and splitting your database into dedicated components.

The most interesting part, which we wanted to detail in this post, is related to configuring automatic queries against a read replicas, or follower database.

Keep reading

Introducing Hyperscale (Citus) on Azure Database for PostgreSQL

Written byBy Craig Kerstiens | May 6, 2019May 6, 2019

For roughly ten years now, I've had the pleasure of running and managing databases for people. In the early stages of building an application you move quickly, adding new tables and columns to your Postgres database to support new functionality. You move quickly, but you don't worry too much because things are fast and responsive–largely because your data is small. Over time your application grows and matures. Your data model stabilizes, and you start to spend more time tuning and tweaking to ensure performance and stability stay where they need to. Eventually you get to the point where you miss the days of maintaining a small database, because life was easier then. Indexes were created quickly, joins were fast, count(*) didn't bring your database to a screeching halt, and vacuum was not a regular part of your lunchtime conversation. As you continue to tweak and optimize the system, you know you need a plan for the future and know how you’re going to continue to scale.

Now in GA: Introducing Hyperscale (Citus) on Azure Database for PostgreSQL

With Hyperscale (Citus) on Azure Database for PostgreSQL, we help many of those worries fade away. I am super excited to announce that Citus is now available on Microsoft Azure, as a new built-in deployment option on the Azure Database for PostgreSQL called Hyperscale (Citus).

Hyperscale (Citus) scales out your data across multiple physical nodes, with the underlying data being sharded into much smaller bits. The same database sharding principles that work for Facebook and Google are baked right into the database. But, unlike traditional sharded systems, your application doesn't have to learn how to shard the data. With Azure Database for PostgreSQL, Hyperscale (Citus) takes Postgres, the open source relational database, and extends it with low level internal hooks.

Keep reading

Postgres and superuser access

Written byBy Craig Kerstiens | April 4, 2019Apr 4, 2019

A few days ago a CVE was announced for Postgres. To say this CVE is a bit overblown is an understatement. The first thing to know is you're likely completely safe. If you run on a managed service provider you are not going to be affected by this, and if you're managing your own Postgres database all chances are you are equally as safe. This CVE received a note from Tom Lane on the pgsql-announce mailing list in response to it getting a broad amount of awareness and attention.

But, we thought this might be a good time to talk about a few principles and concepts that underly how Postgres works.

Keep reading

A health check playbook for your Postgres database

Written byBy Craig Kerstiens | March 29, 2019Mar 29, 2019

I talk with a lot of folks that set their database up, start working with it, and then are surprised by issues that suddenly crop up out of nowhere. The reality is, so many don't want to have to be a DBA, instead you would rather build features and just have the database work. But your is that a database is a living breathing thing. As the data itself changes what is the right way to query and behave changes. Making sure your database is healthy and performing at it's maximum level doesn't require a giant overhaul constantly. In fact you can probably view it similar to how you approach personal health. Regular check-ups allow you to make small but important adjustments without having to make dramatic life altering changes to keep you on the right path.

After years of running and managing literally millions of Postgres databases, here's my breakdown of what your regular Postgres health check should look like. Consider running this on a monthly basis to be able to make small tweaks and adjustments and avoid the drastic changes.

Keep reading

How to evaluate your next database

Written byBy Craig Kerstiens | March 20, 2019Mar 20, 2019

Choosing a database isn't something you do every day. You generally choose it once for a project, then don't look back. If you experience years of success with your application you one day have to migrate to a new database, but that occurs years down the line. In choosing a database there are a few key things to consider. Here is your checklist, and spoiler alert, Postgres checks out strongly in each of these categories.

Keep reading