Citus 10 is out! New features include columnar storage & Citus on a single node—plus we’ve open-sourced the shard rebalancer. Read the Citus 10 blog.

Skip navigation

Citus Blog

Articles tagged: Citus on Azure

Sai Krishna Srirampur

Faster data migrations in Postgres

Written byBy Sai Srirampur | February 20, 2021Feb 20, 2021

In my day to day, I get to work with many customers migrating their data to Postgres. I work with customers migrating from homogenous sources (PostgreSQL) and also from heterogenous database sources such as Oracle and Redshift. Why do people pick Postgres? Because of the richness of PostgreSQL—and features like stored procedures, JSONB, PostGIS for geospatial workloads, and the many useful Postgres extensions, including my personal favorite: Citus.

A large chunk of the migrations that I help people with are homogenous Postgres-to-Postgres data migrations to the cloud. As Azure Database for PostgreSQL runs open source Postgres, in many cases the application migration can be drop-in and doesn’t require a ton effort. The majority of the effort usually goes into deciding on and implementing the right strategy for performing the data migration.

Keep reading

Once you start using the Citus extension to distribute your Postgres database, you may never want to go back. But what if you just want to experiment with Citus and want to have the comfort of knowing you can go back? Well, as of Citus 9.5, now there is a new undistribute_table() function to make it easy for you to, well, to revert a distributed table back to being a regular Postgres table.

If you are familiar with Citus, you know that Citus is an open source extension to Postgres that distributes your data (and queries) to multiple machines in a cluster—thereby parallelizing your workload and scaling your Postgres database horizontally. When you start using Citus—whether you’re using Citus open source or whether you’re using Citus as part of a managed service in the cloud—usually the first thing you need to do is distribute your Postgres tables across the cluster.

Keep reading
Claire Giordano

When to use Hyperscale (Citus) to scale out Postgres

Written byBy Claire Giordano | December 5, 2020Dec 5, 2020

If you’ve built your application on Postgres, you already know why so many people love Postgres.

And if you’re new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

Keep reading

In my work as an engineer on the Postgres team at Microsoft, I get to meet all sorts of customers going through many challenging projects. One recent database migration project I worked on is a story that just needs to be told. The customer—in the retail space—was using Redshift as the data warehouse and Databricks as their ETL engine. Their setup was deployed on AWS and GCP, across different data centers in different regions. And they’d been running into performance bottlenecks and also was incurring unnecessary egress cost.

Specifically, the amount of data in our customer’s analytic store was growing faster than the compute required to process that data. AWS Redshift was not able to offer independent scaling of storage and compute—hence our customer was paying extra cost by being forced to scale up the Redshift nodes to account for growing data volumes. To address these issues, they decided to migrate their analytics landscape to Azure.

Keep reading

When working on the internals of Citus, an open source extension to Postgres that transforms Postgres into a distributed database, we often get to talk with customers that have interesting challenges you won’t find everywhere. Just a few months back, I encountered an analytics workload that was a really good fit for Citus.

But we had one problem: the percentile calculations on their data (over 300 TB of data) could not meet their SLA of 30 seconds.

To make things worse, the query performance was not even close to the target: the percentile calculations were taking about 6 minutes instead of the required 30 second SLA.

Keep reading

How do you know if the next update to your software is ready for hundreds of millions of customers? It starts with data. And when it comes to Windows, we’re talking lots of data. The Windows team measures the quality of new software builds by scrutinizing 20,000 diagnostic metrics based on data flowing in from 800 million Windows devices. At the same time, the team evaluates feedback from Microsoft engineers who are using pre-release versions of Windows updates.

At Microsoft, the Windows diagnostic metrics are displayed on a real-time analytics dashboard called “Release Quality View” (RQV), which helps the internal “ship-room” team assess the quality of the customer experience before each new Windows update is released. Given the importance of Windows for Microsoft’s customers, the RQV analytics dashboard is a critical tool for Windows engineers, program managers, and execs.

Keep reading

For roughly ten years now, I’ve had the pleasure of running and managing databases for people. In the early stages of building an application you move quickly, adding new tables and columns to your Postgres database to support new functionality. You move quickly, but you don’t worry too much because things are fast and responsive–largely because your data is small. Over time your application grows and matures. Your data model stabilizes, and you start to spend more time tuning and tweaking to ensure performance and stability stay where they need to. Eventually you get to the point where you miss the days of maintaining a small database, because life was easier then. Indexes were created quickly, joins were fast, count(*) didn’t bring your database to a screeching halt, and vacuum was not a regular part of your lunchtime conversation. As you continue to tweak and optimize the system, you know you need a plan for the future and know how you’re going to continue to scale.

Now in GA: Introducing Hyperscale (Citus) on Azure Database for PostgreSQL

With Hyperscale (Citus) on Azure Database for PostgreSQL, we help many of those worries fade away. I am super excited to announce that Citus is now available on Microsoft Azure, as a new built-in deployment option on the Azure Database for PostgreSQL called Hyperscale (Citus).

Hyperscale (Citus) scales out your data across multiple physical nodes, with the underlying data being sharded into much smaller bits. The same database sharding principles that work for Facebook and Google are baked right into the database. But, unlike traditional sharded systems, your application doesn’t have to learn how to shard the data. With Azure Database for PostgreSQL, Hyperscale (Citus) takes Postgres, the open source relational database, and extends it with low level internal hooks.

Keep reading

Page 1 of 1