Blog posts by Daniel Farina on the Citus Blog

How we implement Disaster Recovery and High Availability with Postgres on Citus Cloud

Written by By Daniel Farina | March 23, 2017 Mar 23, 2017

AWS is the leader when it comes to the cloud, and for good reason. AWS is well ahead in the quality and breadth of services they offer.

However, when a service is running at the scale of AWS, it is natural to expect some failures to occur. According to AWS EBS availability is designed for 99.999%.

The annual failure rate (AFR) is 0.1% - 0.2%, where failure means a complete or partial failure. For example, if you had 1,000 EBS discs, you should expect 1 or 2 to have a failure per year. In our experience, partial failure is significantly more common than a complete loss. Even so, a partial loss can take a lot of time to resolve and can still be debilitating to a business.

Over the years, there have been some AWS failures that made news headlines due to havoc caused for both companies and their users. These incidents put a spotlight on AWS’ imperfections.

Keep reading

Using State Machines to Power Citus Cloud (our Database as a Service)

Written by By Daniel Farina | August 12, 2016 Aug 12, 2016

It has been several months since the launch of Citus Cloud, and we’d like to share one part of its design with you. In particular, the fundamental unit of organization for our hosted service on top of AWS is concurrent state machines. In what follows...

Keep reading