Citus Data Blog

Thoughts on scaling out PostgreSQL, sharding, multi-tenant apps, real-time analytics, and distributed databases.

Craig Kerstiens
By Craig Kerstiens
May 15, 2018

Fun with SQL: Recursive CTEs in Postgres

Common Table Expressions (CTEs) are a powerful construct within SQL. In day to day conversation, you may hear CTEs referred to as WITH clauses. You can think of CTEs as similar to a view that is materialized only while that query is running and does not exist outside of that query. CTEs can be very useful building blocks for allowing your large SQL queries to be more readable. But, they can also be used recursively allowing you to create some very complex queries without having to drop down to a procedural language like plpgsql or plv8.

Continue reading
Craig Kerstiens
By Craig Kerstiens
May 8, 2018

It's the future (for databases)

Hi. I work as a data architect in San Francisco and I’m auditing Dr. Jones class to stay up to date on the latest technologies and she mentioned you might be able to help me before I get too deep into the design of a new system.

  • I would be happy to help. Can you give me an overview of where you’re at?

My default was just to use Postgres. I had a few questions on what schema designs might make most sense.

  • Well I’m working with more interesting data architectures, not sure I can help you much with Postgres. Really getting excited about what’s possible with neomodern data architectures, they make it so my app devs can build any feature their hearts desire.

I thought your expertise used to be relational databases?

  • It was, but neomodern data architectures are better. Neomodern data architectures allow it so app devs can build any feature they like without having to think about data models. Really, it’s the future of databases.
Continue reading
Craig Kerstiens
By Craig Kerstiens
April 25, 2018

Monitoring your Citus Cloud cluster with Datadog

At the heart of most applications is a database. Ensuring your database is performing well is key to ensuring your your customers receive a good experience when working with your app. It’s likely you’re already monitoring your systems today, and want to monitor your database using similar tooling. Today we’re excited to release turnkey integration for one of the more popular tools out there to monitor Citus Cloud clusters: Datadog.

Continue reading
Craig Kerstiens
By Craig Kerstiens
April 17, 2018

Custom configuring your Postgres settings on Citus Cloud

Postgres has long been a reliable database for keeping your data safe, and it is used in a variety of flexible ways. Because of the many flexible ways it can be used (ranging from embeded devices to data warehousing to large transactional system) it also comes with a lot of knobs to configure it. Part of our approach in providing a fully managed database as a service is configuring Postgres to be production ready from the moment you click a provision, which is what you get with Citus Cloud.

Over time though we have seen a need for more flexibility to tune and customize configurations to your specific needs. Part of this flexibility is in supporting the rich feature set of Postgres features such as JSONB, rich indexing, and more. Part is supporting a broad set of extensions such at HyperLogLog, pg_partman, TopN, PostGIS, and more. And today we’re excited to support custom configuration of your Citus clusters on Citus Cloud to enable even broader flexibility.

Continue reading
Ozgun Erdogan
By Ozgun Erdogan
April 5, 2018

Citus 7.3: Broader SQL coverage, Tableau Integration, TopN extension, and more

Today, we’re excited to announce Citus 7.3—the latest release of our distributed database that scales out Postgres. Citus 7.3 improves support for complex analytical queries, provides integration with Tableau and other BI tools, and integrates with the open source Postgres extension, TopN.

The features in this latest Citus database release are particularly important for real-time analytics workloads. In these workloads, users typically need to ingest data in real time and run analytical queries with sub-second response times. A good example is when you’re serving a dashboard to thousands of customers and your database needs to provide fast replies over billions of rows.

Here’s a quick overview of what’s new in Citus. For an overview of other recent Citus features, check out these blog entries about TopN for your Postgres database and Citus 7.2.

Continue reading
Craig Kerstiens
By Craig Kerstiens
April 4, 2018

Raw SQL access for users with row-level-security

We talk with a lot of SaaS companies that are encountering issues with their database. The most common issue we discuss relates to performance, either a need to keep scaling or at times just dealing with really intensive data needs of only a few customers and how to handle that.

And then as you continue to scale and capture more data you want to provide more value back to your customers.

At times you might even consider giving raw SQL access to your largest and most important customers. Typically controlling what data you give them, via dashboards and canned reports is ideal–this way you can control performance impact and other risks. But, if you have extra large/important customers that require you to give them raw access to the data… then PostgreSQL and thus Citus has your answer.

Pro-tip: Don’t grant access to *all** of your customers.*

Continue reading
Craig Kerstiens
By Craig Kerstiens
March 31, 2018

Contributing to Postgres via patch review

Citus is an open source extension to Postgres that transforms Postgres into a distributed database, scaling horizontally. The fact that Citus is built on top of Postgres is a huge benefit to our users: it means that when you choose Citus, you get all the great features that are available in Postgres. And Postgres itself is an awesome database. Awesome. As a team, we value the foundation we’re built on and regularly aim to contribute back to it. We have a number of developers that have contributed to Postgres over the years from features like watch, event triggers, and the PostgreSQL extension framework.

Recently a few more of our engineers expressed an interest in giving back to the PostgreSQL community. In fact it’s a common question, how can we better help the PostgreSQL project? And a common answer is reviewing patches. To help kick start that process we organized a session and carved out a few days just for patch review during the most recent commitfest.

Continue reading
Furkan Sahin
By Furkan Sahin
March 27, 2018

TopN for your Postgres database

People seem to love lists of the most popular things. I think this is true of many of us. Including developers. Did you get all excited like I did, and listen right away to every song on the list when Spotify released Your Top Songs 2017? (Here are mine) When the Academy Awards were announced, did you check in on the candidates and winners? Did you pay attention to the medalists and top scoring hockey teams in the Winter Olympics?

Sometimes this problem of finding the top on a list is referred to as the Top-K problem. Also the Top “N” problem. Whether it’s the top grossing sales reps or the most visited pages on your website, and whether you call it the Top K or the TopN, for most of us, there is usually something we want to know the top “N” of.

Finding the top “N” is not easy

To find the top occurring item you generally need to count through all the records. Counting the clicks in your web app, the number of times you’ve listened to song, or the number of downloads of your project. It is all about counting. Counting, sorting, and limiting the list in Postgres is straightforward, and this works great on smaller sets of data. What if there are thousands of events? Machines these days are pretty fast so this isn’t much of a problem. Millions is even acceptable. Billions? That may take a bit longer…

However, getting the counts of different items, sorting them and taking the top “N” of them out of your database—that can start to become much more challenging at larger scale.

Even further, what if you want to materialize your top N results for smaller sets in regular basis and run some combination queries to further analyze? The real problem starts then. Calculating the Top N can be a challenge. This is why my team at Citus Data (where we build the Citus extension to Postgres that scales out Postgres horizontally) is happy to announce the release of the open source TopN extension for PostgreSQL.

Inspiration for TopN came from a Citus Data customer who needed to use TopN-like functionality in concert with the Citus extension that scales out their Postgres database. When designing TopN, we decided to implement TopN as a Postgres extension. And we decided to write TopN in C. TopN outputs a JSONB object which you can flexibly use for different use cases. Aggregation functions which take JSONB input and union them together are also included.

TopN can be used to calculate the most frequently occurring values in a column, and is part of the class of probabilistic distinct algorithms called sketch algorithms. Let’s look further at how the TopN extension to Postgres actually works.

Continue reading
Craig Kerstiens
By Craig Kerstiens
March 26, 2018

Citus Data internal hackathon roundup

At Citus Data, we regularly get the team together, because even with an engineering team that is distributed around the globe, face-to-face time is valuable to connecting and collaborating. During our team offsites, we often organize engineering hackathons to proof out new ideas, learn new things, or just for fun. We recently completed one of our Citus hackathons and thought we’d share some of what we built.

The theme of our hackathon this time was on building the ultimate dashboard for our Citus extension to Postgres. For Postgres, there are lots of options out there for capturing and displaying insights into your database. You could use New Relic, Vivid Cortex, or something entirely open source like pghero. But we wanted to explore the question, what more could we provide?

Our two teams took two very different approaches, but each emerged with something interesting that we hope to continue to build on and productize in the future. In case you’re curious, here’s a look at each of the projects from our hackday:

Continue reading
Will Leinweber
By Will Leinweber
March 19, 2018

Database constraints in Postgres: The last line of defense

Data has a certain gravity and inertia. Once it’s stored it’s not likely to be actively moved or frequently modified. At least not for your one source of truth. Protecting that data and ensuring it’s both safely stored but also correct is worth the time investment because of the value it has.

Going further, your database schema and models are going to change far less than your application code. Because it changes less frequently the case can easily be made that spending some time to ensure correctness at the database level is a great return on time.

This post was the result of a recent talk I recently gave at PgDay Paris. The conference itself was a great local event in Paris, and while there we had a chance to meet with a few of our customers based in Paris as well. As it’s always great to get out in person and chat with people about Postgres and their experience in scaling their database, many remarked that the talk could be useful to others that weren’t there. So as I thought it would be worthwhile to write-up, and here you go:

Continue reading

Page 2 of 18