Citus BlogCitus Blog

Thoughts about the Citus database—as well as PostgreSQL, sharding, distributed databases, and other open source extensions to Postgres.

Craig Kerstiens

Monitoring your Citus Cloud cluster with Datadog

Written byBy Craig Kerstiens | April 25, 2018Apr 25, 2018

At the heart of most applications is a database. Ensuring your database is performing well is key to ensuring your your customers receive a good experience when working with your app. It's likely you're already monitoring your systems today, and want to monitor your database using similar tooling. Today we're excited to release turnkey integration for one of the more popular tools out there to monitor Citus Cloud clusters: Datadog.

Keep reading

Postgres has long been a reliable database for keeping your data safe, and it is used in a variety of flexible ways. Because of the many flexible ways it can be used (ranging from embeded devices to data warehousing to large transactional system) it also comes with a lot of knobs to configure it. Part of our approach in providing a fully managed database as a service is configuring Postgres to be production ready from the moment you click a provision, which is what you get with Citus Cloud.

Over time though we have seen a need for more flexibility to tune and customize configurations to your specific needs. Part of this flexibility is in supporting the rich feature set of Postgres features such as JSONB, rich indexing, and more. Part is supporting a broad set of extensions such at HyperLogLog, pg_partman, TopN, PostGIS, and more. And today we're excited to support custom configuration of your Citus clusters on Citus Cloud to enable even broader flexibility.

Keep reading

Today, we’re excited to announce Citus 7.3—the latest release of our distributed database that scales out Postgres. Citus 7.3 improves support for complex analytical queries, provides integration with Tableau and other BI tools, and integrates with the open source Postgres extension, TopN.

The features in this latest Citus database release are particularly important for real-time analytics workloads. In these workloads, users typically need to ingest data in real time and run analytical queries with sub-second response times. A good example is when you’re serving a dashboard to thousands of customers and your database needs to provide fast replies over billions of rows.

Here’s a quick overview of what’s new in Citus. For an overview of other recent Citus features, check out these blog entries about TopN for your Postgres database and Citus 7.2.

Keep reading
Craig Kerstiens

Raw SQL access for users with row-level-security

Written byBy Craig Kerstiens | April 4, 2018Apr 4, 2018

We talk with a lot of SaaS companies that are encountering issues with their database. The most common issue we discuss relates to performance, either a need to keep scaling or at times just dealing with really intensive data needs of only a few customers and how to handle that.

And then as you continue to scale and capture more data you want to provide more value back to your customers.

At times you might even consider giving raw SQL access to your largest and most important customers. Typically controlling what data you give them, via dashboards and canned reports is ideal–this way you can control performance impact and other risks. But, if you have extra large/important customers that require you to give them raw access to the data... then PostgreSQL and thus Citus has your answer.

Pro-tip: Don't grant access to *all** of your customers.*

Keep reading
Craig Kerstiens

Contributing to Postgres via patch review

Written byBy Craig Kerstiens | March 31, 2018Mar 31, 2018

Citus is an open source extension to Postgres that transforms Postgres into a distributed database, scaling horizontally. The fact that Citus is built on top of Postgres is a huge benefit to our users: it means that when you choose Citus, you get all the great features that are available in Postgres. And Postgres itself is an awesome database. Awesome. As a team, we value the foundation we're built on and regularly aim to contribute back to it. We have a number of developers that have contributed to Postgres over the years from features like watch, event triggers, and the PostgreSQL extension framework.

Recently a few more of our engineers expressed an interest in giving back to the PostgreSQL community. In fact it's a common question, how can we better help the PostgreSQL project? And a common answer is reviewing patches. To help kick start that process we organized a session and carved out a few days just for patch review during the most recent commitfest.

Keep reading
Furkan Sahin

TopN for your Postgres database

Written byBy Furkan Sahin | March 27, 2018Mar 27, 2018

People seem to love lists of the most popular things. I think this is true of many of us. Including developers. Did you get all excited like I did, and listen right away to every song on the list when Spotify released Your Top Songs 2017? (Here are mine) When the Academy Awards were announced, did you check in on the candidates and winners? Did you pay attention to the medalists and top scoring hockey teams in the Winter Olympics?

Sometimes this problem of finding the top on a list is referred to as the Top-K problem. Also the Top "N" problem. Whether it’s the top grossing sales reps or the most visited pages on your website, and whether you call it the Top K or the TopN, for most of us, there is usually something we want to know the top "N" of.

Finding the top "N" is not easy

To find the top occurring item you generally need to count through all the records. Counting the clicks in your web app, the number of times you’ve listened to song, or the number of downloads of your project. It is all about counting. Counting, sorting, and limiting the list in Postgres is straightforward, and this works great on smaller sets of data. What if there are thousands of events? Machines these days are pretty fast so this isn’t much of a problem. Millions is even acceptable. Billions? That may take a bit longer…

However, getting the counts of different items, sorting them and taking the top "N" of them out of your database—that can start to become much more challenging at larger scale.

Even further, what if you want to materialize your top N results for smaller sets in regular basis and run some combination queries to further analyze? The real problem starts then. Calculating the Top N can be a challenge. This is why my team at Citus Data (where we build the Citus extension to Postgres that scales out Postgres horizontally) is happy to announce the release of the open source TopN extension for PostgreSQL.

Inspiration for TopN came from a Citus Data customer who needed to use TopN-like functionality in concert with the Citus extension that scales out their Postgres database. When designing TopN, we decided to implement TopN as a Postgres extension. And we decided to write TopN in C. TopN outputs a JSONB object which you can flexibly use for different use cases. Aggregation functions which take JSONB input and union them together are also included.

TopN can be used to calculate the most frequently occurring values in a column, and is part of the class of probabilistic distinct algorithms called sketch algorithms. Let's look further at how the TopN extension to Postgres actually works.

Keep reading
Craig Kerstiens

Citus Data internal hackathon roundup

Written byBy Craig Kerstiens | March 26, 2018Mar 26, 2018

At Citus Data, we regularly get the team together, because even with an engineering team that is distributed around the globe, face-to-face time is valuable to connecting and collaborating. During our team offsites, we often organize engineering hackathons to proof out new ideas, learn new things, or just for fun. We recently completed one of our Citus hackathons and thought we'd share some of what we built.

The theme of our hackathon this time was on building the ultimate dashboard for our Citus extension to Postgres. For Postgres, there are lots of options out there for capturing and displaying insights into your database. You could use New Relic, Vivid Cortex, or something entirely open source like pghero. But we wanted to explore the question, what more could we provide?

Our two teams took two very different approaches, but each emerged with something interesting that we hope to continue to build on and productize in the future. In case you’re curious, here’s a look at each of the projects from our hackday:

Keep reading
Craig Kerstiens

Raw SQL access for users with row-level-security

Written byBy Craig Kerstiens | March 19, 2018Mar 19, 2018

We talk with a lot of SaaS companies that are encountering issues with their database. The most common issue we discuss relates to performance, either a need to keep scaling or at times just dealing with really intensive data needs of only a few customers and how to handle that.

And then as you continue to scale and capture more data you want to provide more value back to your customers.

At times you might even consider giving raw SQL access to your largest and most important customres. Typically controlling what data you give them, via dashboards and canned reports is ideal–this way you can control performance impact and other risks. But, if you have extra large/important customers that require you to give them raw access to the data... then PostgreSQL and thus Citus has your answer.

Pro-tip: Don't grant access to *all** of your customers.*

Keep reading

Data has a certain gravity and inertia. Once it's stored it's not likely to be actively moved or frequently modified. At least not for your one source of truth. Protecting that data and ensuring it's both safely stored but also correct is worth the time investment because of the value it has.

Going further, your database schema and models are going to change far less than your application code. Because it changes less frequently the case can easily be made that spending some time to ensure correctness at the database level is a great return on time.

This post was the result of a recent talk I recently gave at PgDay Paris. The conference itself was a great local event in Paris, and while there we had a chance to meet with a few of our customers based in Paris as well. As it’s always great to get out in person and chat with people about Postgres and their experience in scaling their database, many remarked that the talk could be useful to others that weren’t there. So as I thought it would be worthwhile to write-up, and here you go:

Keep reading
Craig Kerstiens

Fun with SQL: generate_series in Postgres

Written byBy Craig Kerstiens | March 14, 2018Mar 14, 2018

There are times within Postgres where you may want to generate sample data or some consistent series of records to join in order for reporting. Enter the simple but handy set returning function of Postgres: generate_series. generate_series as the name implies allows you to generate a set of data starting at some point, ending at another point, and optionally set the incrementing value. generate_series works on two datatypes:

  • integers
  • timestamps

Keep reading

Page 16 of 32