Blog posts by Sai Srirampur on the Citus Blog

Faster data migrations in Postgres

Written by By Sai Srirampur | February 20, 2021 Feb 20, 2021

In my day to day, I get to work with many customers migrating their data to Postgres. I work with customers migrating from homogenous sources (PostgreSQL) and also from heterogenous database sources such as Oracle and Redshift. Why do people pick Postgres? Because of the richness of PostgreSQL—and features like stored procedures, JSONB, PostGIS for geospatial workloads, and the many useful Postgres extensions, including my personal favorite: Citus.

A large chunk of the migrations that I help people with are homogenous Postgres-to-Postgres data migrations to the cloud. As Azure Database for PostgreSQL runs open source Postgres, in many cases the application migration can be drop-in and doesn’t require a ton effort. The majority of the effort usually goes into deciding on and implementing the right strategy for performing the data migration.

Keep reading

Migrating interactive analytics apps from Redshift to Postgres, ft. Hyperscale (Citus)

Written by By Sai Srirampur | October 28, 2020 Oct 28, 2020

In my work as an engineer on the Postgres team at Microsoft, I get to meet all sorts of customers going through many challenging projects. One recent database migration project I worked on is a story that just needs to be told. The customer—in the retail space—was using Redshift as the data warehouse and Databricks as their ETL engine. Their setup was deployed on AWS and GCP, across different data centers in different regions. And they'd been running into performance bottlenecks and also was incurring unnecessary egress cost.

Specifically, the amount of data in our customer's analytic store was growing faster than the compute required to process that data. AWS Redshift was not able to offer independent scaling of storage and compute—hence our customer was paying extra cost by being forced to scale up the Redshift nodes to account for growing data volumes. To address these issues, they decided to migrate their analytics landscape to Azure.

Keep reading

Using search_path and views to hide columns for reporting with Postgres

Written by By Sai Srirampur | July 3, 2018 Jul 3, 2018

Data security and data privacy are important, no one disputes that. We all want to keep private things private and to keep our data secure. And yet, data needs to be shared, to enable insights, to help organizations observe patterns and have those “ah-ha” moments. None of us want the extreme where, in an effort to keep data secure, there is no access to data of any form within your organization, and the result is no business insights or analytics. With GDPR going into effect, you've likely been rethinking what security controls you have in place.

Here at Citus Data we collaborate with SaaS businesses and larger enterprises alike, generally to consult on Postgres data models and how to best scale out their database. (Our Citus extension to Postgres enables you to scale out Postgres horizontally. The benefit: performance.) In working with teams, one common thing we've seen companies do is to restrict who can see which bits of Personally Identifiable Information (PII) within your database. There are a number of approaches, including heavyweight ETL processes that mask PII bits. An ETL process tends to introduce a certain amount of latency from the time data is in your system until the time it can be analyzed.

Fortunately, Postgres provides a few primitives that can be used directly within your database to hide PII, while still enabling sophisticated analytics and exploration of data in real time.

Here we'll look at using Postgres schemas and views to provide access to data while keeping PII safe and hidden.

Keep reading

Fun with SQL: Relocating shards on a Citus database cluster

Written by By Sai Srirampur | February 28, 2018 Feb 28, 2018

The Citus extension to Postgres allows you to shard your Postgres database across multiple nodes without having to make major changes to your SaaS application. Citus then provides performance improvements (as compared to single-node Postgres) by transforming SQL queries and distributing queries across multiple nodes, thereby parallelizing the workload. This means that a 2 node, 4 core Citus database cluster could perform 4x faster than single node Postgres.

With the Citus shard rebalancer, you can easily scale your database cluster from 2 nodes to 3 nodes or 4 nodes, with no downtime. You simply run the move shard function on the co-location group you want move shards for, and Citus takes care of the rest. When Citus moves shards, it ensures tables that are co-located stay together. This means all of your joins, say, from orders to order_items still work, just as you'd expect.

Keep reading

Scaling out your Django Multi-tenant App

Written by By Sai Srirampur | November 14, 2017 Nov 14, 2017

There are a number of data architectures you could use when building a multi-tenant app. Some, such as using one database per customer or one schema per customer, have trade-offs when it comes to larger scale. The other option is to build the notion of tenancy directly into the logic of your SaaS application. With django-multitenant and Citus, built-in tenancy becomes much easier to put in place for your application without having to re-invent the wheel yourself.

Our django-multitenant Python library, enables easy scale out of applications that are built on top of Django and follow a multi tenant data model. This Python library has evolved from our experience working with SaaS customers, scaling out their multi-tenant apps.

Keep reading