Citus Blog

Articles tagged: popular

Seasons each have a different feel, a different rhythm. Temperature, weather, sunlight, and traditions—they all vary by season. For me, summer usually includes a beach vacation. And winter brings the smell of hot apple cider on the stove, days in the mountains hoping for the next good snowstorm—and New Year’s resolutions. Somehow January is the time to pause and reflect on the accomplishments of the past year, to take stock in what worked, and what didn’t. And of course there are the TOP TEN LISTS.

Spoiler alert, yes, this is a Top 10 list. If you’re a regular on the Citus Data blog, you know our Citus database engineers love PostgreSQL. And one of the open source responsibilities we take seriously is the importance of sharing learnings, how-to’s, and expertise. One way we share learnings is by giving lots of conference talks (seems like I have to update our Events page every week with new events.) And another way we share our learnings is with our blog.

So just in case you missed any of our best posts from last year, here is the TOP TEN list of the most popular Citus Data blogs published in 2018. Enjoy.

Keep reading
Craig Kerstiens

Configuring memory for Postgres

Written byBy Craig Kerstiens | June 12, 2018Jun 12, 2018

work_mem is perhaps the most confusing setting within Postgres. work_mem is a configuration within Postgres that determines how much memory can be used during certain operations. At its surface, the work_mem setting seems simple: after all, work_mem just specifies the amount of memory available to be used by internal sort operations and hash tables before writing data to disk. And yet, leaving work_mem unconfigured can bring on a host of issues. What perhaps is more troubling, though, is when you receive an out of memory error on your database and you jump in to tune work_mem, only for it to behave in an un-intuitive manner.

Keep reading

If you’ve done some performance tuning with Postgres, you might have used EXPLAIN. EXPLAIN shows you the execution plan that the PostgreSQL planner generates for the supplied statement. It shows how the table(s) referenced by the statement will be scanned (using a sequential scan, index scan etc), and what join algorithms will be used if multiple tables are used. But, how does Postgres come up with these plans?

Keep reading
Marco Slot

When Postgres blocks: 7 tips for dealing with locks

Written byBy Marco Slot | February 22, 2018Feb 22, 2018

Last week I wrote about locking behaviour in Postgres, which commands block each other, and how you can diagnose blocked commands. Of course, after the diagnosis you may also want a cure. With Postgres it is possible to shoot yourself in the foot, but Postgres also offers you a way to stay on target. These are some of the important do’s and don’ts that we’ve seen as helpful when working with users to migrate from their single node Postgres database to Citus or when building new real-time analytics apps on Citus.

Keep reading
Marco Slot

PostgreSQL rocks, except when it blocks: Understanding locks

Written byBy Marco Slot | February 15, 2018Feb 15, 2018

On the Citus open source team, we engineers take an active role in helping our users scale out their Postgres database, be it for migrating an existing application or building a new application from scratch. This means we help you with distributing your relational data model—and also with getting the most out of Postgres.

One problem I often see users struggle with when it comes to Postgres is locks. While Postgres is amazing at running multiple operations at the same time, there are a few cases in which Postgres needs to block an operation using a lock. You therefore have to be careful about which locks your transactions take, but with the high-level abstractions that PostgreSQL provides, it can be difficult to know exactly what will happen. This post aims to demystify the locking behaviors in Postgres, and to give advice on how to avoid common problems.

Keep reading
Joe Kutner

Using Hibernate and Spring to Build Multi-Tenant Java Apps

Written byBy Joe Kutner | February 13, 2018Feb 13, 2018

If you're building a Java app, there's a good chance you're using Hibernate. The Hibernate ORM is a nearly ubiquitous choice for Java developers who need to interact with a relational database. It's mature, widely supported, and feature rich—as demonstrated by its support for multi tenant applications.

Hibernate officially supports two different multi-tenancy mechanisms: separate database and separate schema. Unfortunately, both of these mechanisms come with some downsides in terms of scaling. A third Hibernate multi-tenancy mechanism, a tenant discriminator, also exists, and it’s usable—but it’s still considered a work-in-progress by some. Unlike the separate database and separate schema approaches, which require distinct database connections for each tenant, Hibernate’s tenant discriminator model stores tenant data in a single database and partitions records with either a simple column value or a complex SQL formula.

But fear not, despite the unfinished state of Hibernate's built-in support for a tenant discriminator (or in simple terms tenant_id), it's possible to implement your own discriminator using standard Spring, Hibernate, and AspectJ mechanisms that work quite well. The Hibernate tenant discriminator model works well as you start small on a single-node Postgres database, and even better, tenant discriminator can continue to scale as your data grows by leveraging the Citus extension to Postgres.

Keep reading
Top 10 Most Popular Citus Data Blog Posts in 2017 cover image

What Postgres and distributed database topics got the most attention on our Citus Data blog in 2017? Out of the 47 new posts we published last year, it’s pretty clear that many of you were interested in sharding relational databases, whether it be Ozgun’s principles of sharding or Craig’s post on figuring out which sharding data model is right for you. Heck, the five sharding data models post was so popular that it even got re-published recently on HackerNoon.

Keep reading
Craig Kerstiens

Faster bulk loading in Postgres with copy

Written byBy Craig Kerstiens | November 8, 2017Nov 8, 2017

If you've used a relational database, you understand basic INSERT statements. Even if you come from a NoSQL background, you likely grok inserts. Within the Postgres world, there is a utility that is useful for fast bulk ingestion: \copy. Postgres \copy is a mechanism for you to bulk load data in or out of Postgres.

First, lets pause. Do you even need to bulk load data and what's it have to do with Citus? We see customers leverage Citus for a number of different uses. When looking at Citus for a transactional workload, say as the system of record for some kind of multi-tenant SaaS app, your app is mostly performing standard insert/updates/deletes.

But when you're leveraging Citus for real-time analytics, you may already have a separate ingest pipeline. In this case you might be looking at event data, which can be higher in volume than a standard web app workload. If you already have an ingest pipeline that reads off Apache Kafka or Kinesis, you could be a great candidate for bulk ingest.

Back to our feature presentation: Postgres \copy. Copy is interesting because you can achieve much higher throughput than with single row inserts.

Keep reading
Craig Kerstiens

Five sharding data models and which is right

Written byBy Craig Kerstiens | August 28, 2017Aug 28, 2017

When it comes to scaling your database, there are challenges but the good news is that you have options. The easiest option of course is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices.

Deleting portions of your data is simple, if you can afford to do it. Regarding sharding there are a number of approaches and which one is right depends on a number of factors. Here we'll review a survey of five sharding approaches and dig into what factors guide you to each approach.

Keep reading
Ozgun Erdogan

Principles of Sharding for Relational Databases

Written byBy Ozgun Erdogan | August 9, 2017Aug 9, 2017

When your database is small (10s of GB), it's easy to throw more hardware at the problem and scale up. As these tables grows however, you need to think about other ways to scale your database.

In one way, sharding is the best way to scale. Sharding enables you to linearly scale your database’s cpu, memory, and disk resources by separating your database into smaller parts. In other ways, sharding is a controversial topic. The internet is full of advice on sharding, from "essential to scaling your database infrastructure" to "why you never want to shard". So the question is, whose advice should you take?

Keep reading

Page 3 of 4