Introducing Citus Cloud

Written by Craig Kerstiens
April 19, 2016

Update in October 2022: The Citus managed database service is now available in the cloud as Azure Cosmos DB for PostgreSQL. And as always, the Citus database is also available as open source: you can find the Citus repo on GitHub or download Citus here.

At Citus we believe in making databases easier. Key to that is empowering users to scale Postgres beyond the typical limits of a single node. Our latest Citus release makes it easier than ever to scale memory and processors while retaining access to familiar SQL queries and rich Postgres features. But database management can be tricky even in the single-node case, so we at Citus have been hard at work building the next step in our journey to make databases easier: Citus Cloud, an on-demand cloud service on top of Amazon Web Services available today in private beta. *Citus Cloud is now fully GA, you can learn more here

If you're dealing with event or time series data—whether user messages, logs, web analytics, or connected device events—scaling out to both store and analyze terabytes of data becomes trivial with Citus Cloud. We'll cover some prime Citus use cases later on, but first let's take a tour of what you'll get with Citus Cloud, our as-a-service hosted offering on top of Amazon Web Services.

Provisioning

Citus Cloud plans include a primary database you'll connect to and compute nodes to handle scaling your data and queries. All Citus Cloud databases operate on AWS and are available in all major regions. Once you select your plan and region a new Citus Cloud formation will be available within a matter of minutes. You'll have a standard Postgres connection string to begin working with your database and a dashboard giving you insight into its health.

Continuous monitoring and cloud readiness

Our team has experience managing hundreds of thousands of Postgres instances. While we believe the future is the cloud and it's a great place to run, there are times it can fail. With that in mind, we've put safeguards in place to make your experience running Citus smooth and safe. While you never need to know about many of these, they're always actively protecting you. To take a further look at some of the protection you get behind the scenes:

Data durability

As a data service our first priority is keeping your data safe. We utilize WAL-E (link is external), the popular continuous archiving tool for Postgres. WAL-E has two parts: one to capture Postgres base backups and another to stream write-ahead logs to S3. With these two pieces we can recover your data, even in the event of catastrophic failure.

Monitoring

Citus Cloud continuously monitors your database, including running health checks every 30 seconds. Should something seem out of place, our state machine walks through a number of steps to restore things. Depending on the state of your application, one of these steps could be to perform automated database failover. Because Citus Cloud keeps the formation IP address (and thus connection string) stable, this change is transparent to clients (other than a small blip). If any of our automated processes fails to restore formation health within minutes of initial detection, we'll page one of the engineers who built the service.

High availability

All Citus Cloud plans come with high availability. In other words, we run standbys both for your primary database and for all your compute nodes. These standbys stream updates from their leaders and can be promoted within seconds. With our high availability setup you'll see reduced downtime though the connection string your application needs will stay consistent.

Simple to use dashboard

If a psql (link is external) terminal isn't your preference for interacting with Postgres then we've got you covered. Our Citus Cloud Console gives you a simple place to provision and monitor your Citus instances. Within the dashboard you'll be able to quickly determine the health of your databases, and get insight into items you care about, such as:

  • Which tables are distributed
  • Per-node cache and index hit rate
  • Storage use

When do you need Citus?

Citus solves a broad range of use cases ranging from needing higher write throughput to performing analytics across terabytes of data. Citus Cloud in particular focuses on and excels at those times when you have a single large table in your database and need to scale it out.

Postgres is extremely efficient at keeping frequently accessed data in memory. How often Postgres can make use of in-memory data (rather than having to read data from storage) is known as your cache hit ratio. We've repeatedly seen cases where a table grows to be hundreds of gigabytes, making it extremely unlikely that queries against it effectively use Postgres' cache. This causes database query performance to fall off a cliff. If you're curious about your cache hit ratio you can easily calculate it with this query:

SELECT 'index hit rate' AS name,
  (sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read),0) AS ratio
FROM pg_statio_user_indexes
UNION ALL
  SELECT 'table hit rate' AS name,
    sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read),0) AS ratio
  FROM pg_statio_user_tables

When your cache hit ratio drops below 99%, performance suffers. Query durations may rapidly increase from tens of milliseconds to multiple seconds.

The usual solution is to scale up your database hardware, which works well until no larger instances are available. Up until today, you would then begin exploring entirely new datastores which promise to scale out your data, often at the expense of query capabilities and application architecture.

Rule of Thumb
If you have a table called events, logs, messages you probably have the one large table problem

With Citus, scaling this table is simple: call a simple sharding function and your one large table is now distributed table across multiple Postgres instances. This removes your ceiling and lets you horizontally grow with your data. Memory—or cache—for your tables scales linearly, but you’re also scaling out processing power, providing entirely new performance capabilities out of reach to simple vertical scaling. And of course with Citus Cloud if you don't want to have to think about managing it yourself, you don't have to.

Pricing and availability

Citus Cloud is available today in private beta. Plans start at $99 a month for development and $990 a month for production. As highlighted above all Citus cloud plans come with:

  • Simple provisioning and upgrades
  • Continuous monitoring
  • Backups and continuous archiving
  • High availability

Get started

Head over to the Citus Cloud site to request access today. If you have questions feel free to drop us a line at cloud-support@citusdata.com or join us on Slack.

Craig Kerstiens

Written by Craig Kerstiens

Former Head of Cloud at Citus Data. Ran product at Heroku Postgres. Countless conference talks on Postgres & Citus. Loves bbq and football.