Blog posts by Craig Kerstiens on the Citus Blog - Page 3

ZFS Private Beta on Citus Cloud

Written by By Craig Kerstiens | July 19, 2018 Jul 19, 2018

ZFS is a open source file system with the option to store data on disk in a compressed form. Itself ZFS supports a number of compression algorithms, giving you flexibility to optimize both performance and how much you store on disk. Compressing your data on disk offers two pretty straightforward advantages:

Reduce the amount of storage you need—thus reducing costs
When reading from disk, requires less data to be scanned, improving performance

To date, we have run Citus Cloud—our fully-managed database as a service that scales out Postgres horizontally—in production on EXT4. Today, we're excited to announce a limited beta program of ZFS support for our Citus Cloud database. ZFS makes Citus Cloud even more powerful for certain use cases. If you are interested in access to the beta contact us to get more info, or continue reading to learn more about the use cases where ZFS and Citus and Postgres can help.

Keep reading

Options for scaling from 1 to 100,000 tenants

Written by By Craig Kerstiens | June 28, 2018 Jun 28, 2018

When you first start out in building a SaaS application you talk about that day in the future when you will have scaling problems, how that'll be the day, how that would be a good problem to have. You focus on getting the first few customers, making sure they have a great experience, and suddenly you're at 10s of customers, then 100s. You've upgraded your app server to a larger one, then you've gone from one ec2 app server to multiple ones with ELB in front of things. You've upgraded your Postgres database from an r3.large on AWS, to r3.xlarge, now you're eyeing that r3.2xlarge next month. In the back of your mind though, you're starting to look at your plans for future growth of your SaaS app, and you're wondering how much larger you can keep going. Your database is performing well at 100 tenants (tenants = customers), your back of the napkin math says you'll be able to scale your app up to 1,000 tenants, but after that you know you're going to have to explore some options.

What are those options and what are the trade-offs and benefits?

Keep reading

Fun with SQL: Functions in Postgres

Written by By Craig Kerstiens | June 21, 2018 Jun 21, 2018

In our previous Fun with SQL post on the Citus Data blog, we covered window functions. Window functions are a special class of function that allow you to grab values across rows and then perform some logic. By jumping ahead to window functions, we missed so many of the other handy functions that exist within Postgres natively. There are in fact several hundred built-in functions. And when needed, you can also create your own user defined functions (UDFs), if you need something custom. Today we're going to walk through just a small sampling of SQL functions that can be extremely handy in PostgreSQL.

Keep reading

How do you pronounce Citus?

Written by By Craig Kerstiens | June 19, 2018 Jun 19, 2018

It’s a common question we get at conferences, on calls, in meetings. “Citrus”, “Citius”, “Citus”, is that how you pronounce it? The quick and short of it is, we’re not named after a fruit. You pronounce it like “site-us”.

Most tend to leave it there, without wondering further. But a few do inquire as to the meaning. Citus’s name comes from the Olympic Motto “Citius, Altius, Fortius” which is Latin for “Faster, Higher, Stronger.” Our goal for the Citus extension is to be fast for both transactional and analytical workloads.

Keep reading

Configuring memory for Postgres

Written by By Craig Kerstiens | June 12, 2018 Jun 12, 2018

work_mem is perhaps the most confusing setting within Postgres. work_mem is a configuration within Postgres that determines how much memory can be used during certain operations. At its surface, the work_mem setting seems simple: after all, work_mem just specifies the amount of memory available to be used by internal sort operations and hash tables before writing data to disk. And yet, leaving work_mem unconfigured can bring on a host of issues. What perhaps is more troubling, though, is when you receive an out of memory error on your database and you jump in to tune work_mem, only for it to behave in an un-intuitive manner.

Keep reading

Citus what is it good for? OLTP? OLAP? HTAP?

Written by By Craig Kerstiens | June 7, 2018 Jun 7, 2018

Earlier this week as I was waiting to begin a talk at a conference, I chatted with someone in the audience that had a few questions. They led off with this question: is Citus a good fit for X? The heart of what they were looking to figure out: is the Citus distributed database a better fit for analytical (data warehousing) workloads, or for more transactional workloads, to power applications? We hear this question quite a lot, so I thought I'd elaborate more on the use cases that make sense for Citus from a technical perspective.

Before I dig in, if you're not familiar with Citus; we transform Postgres into a distributed database that allows you to scale your Postgres database horizontally. Under the covers, your data is sharded across multiple nodes, meanwhile things still appear as a single node to your application. By appearing still like a single node database, your application doesn't need to know about the sharding. We do this as a pure extension to Postgres, which means you get all the power and flexibility that's included within Postgres such as JSONB, PostGIS, rich indexing, and more.

Keep reading

Fun with SQL: Window functions in Postgres

Written by By Craig Kerstiens | June 1, 2018 Jun 1, 2018

Today we continue to explore all the powerful and fun things you can do with SQL. SQL is a very expressive language and when it comes to analyzing your data there isn't a better option. You can see the evidence of SQL's power in all the attempts made by NoSQL databases to recreate the capabilities of SQL. So why not just start with a SQL database that scales? (Like my favorites, Postgres and Citus.)

Today, in the latest post in our 'Fun with SQL' series (earlier blog posts were about recursive CTEs, generate_series, and relocating shards on a Citus database cluster), we're going to look at window functions in PostgreSQL. Window functions are key in various analytic and reporting use cases where you want to compare and contrast data. Window functions allow you to compare values between rows that are somehow related to the current row. Some practical uses of window functions can be:

Finding the first time all users performed some action
Finding how much each users bill increased or decreased from the previous month
Find where all users ranked for some sub-grouping

Keep reading

Preparing your multi-tenant app for scale

Written by By Craig Kerstiens | May 22, 2018 May 22, 2018

We spend a lot of time with companies that are growing fast, or planning for future growth. It may be you've built your product and are now just trying to keep the system growing and scaling to handle new users and revenue. Or you may be still building the product, but know that an even moderate level of success could lead to a lot of scaling. In either case where you spend your time is key in order to not lose valuable time.

As Donald Knuth states it in Computer Programming as an Art:

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

With the above in mind one of the most common questions we get is: What do I need to do now to make sure I can scale my multi-tenant application later?

We've written some before about approaches not to take such as schema based sharding or one database per customer and the trade-offs that come with that approach. Here we'll dig into three key steps you should take that won't be wasted effort should the need to scale occur.

Keep reading

Fun with SQL: Recursive CTEs in Postgres

Written by By Craig Kerstiens | May 15, 2018 May 15, 2018

Common Table Expressions (CTEs) are a powerful construct within SQL. In day to day conversation, you may hear CTEs referred to as WITH clauses. You can think of CTEs as similar to a view that is materialized only while that query is running and does not exist outside of that query. CTEs can be very useful building blocks for allowing your large SQL queries to be more readable. But, they can also be used recursively allowing you to create some very complex queries without having to drop down to a procedural language like plpgsql or plv8.

Keep reading

It's the future (for databases)

Written by By Craig Kerstiens | May 8, 2018 May 8, 2018

Hi. I work as a data architect in San Francisco and I’m auditing Dr. Jones class to stay up to date on the latest technologies and she mentioned you might be able to help me before I get too deep into the design of a new system.

I would be happy to help. Can you give me an overview of where you’re at?

My default was just to use Postgres. I had a few questions on what schema designs might make most sense.

Well I'm working with more interesting data architectures, not sure I can help you much with Postgres. Really getting excited about what's possible with neomodern data architectures, they make it so my app devs can build any feature their hearts desire.

I thought your expertise used to be relational databases?

It was, but neomodern data architectures are better. Neomodern data architectures allow it so app devs can build any feature they like without having to think about data models. Really, it’s the future of databases.

Keep reading