Scaling out PostgreSQL at Cloudflare with Citus

Cloudflare is a content delivery network (CDN) and DNS provider that powers millions of websites around the world. Last week, we were happy to see them publish a technical blog post that described how they power their analytics dashboards using Citus!

We also had three important takeaways from the Cloudflare post. First, Cloudflare has been using PostgreSQL since the early days. They trusted the database, had extensive experience running it, and knew how to use its tools for backups and upgrades. Cloudflare also found two PostgreSQL extensions helpful: hstore for semi-structured data, and HyperLogLog for fast count distinct approximations. Cloudflare wanted to keep using these Postgres tools and extensions without making any changes to their application layer, and also make PostgreSQL scale. Citus enabled them to do just that.

Second, we find that developers are looking to scale out PostgreSQL for different workloads. In case of Cloudflare, they were looking to parallelize their business facing dashboards in real-time. For example, when a Cloudflare business wanted to see the number of unique threats to their website over varying time intervals, these graphs had to be rendered in real-time. For that, incoming queries had to be parallelized across numerous machines and CPU cores.

Finally, we see different approaches to capturing high-volume events data and presenting insights through dashboards. One approach is to insert raw events data into the database, and have enough hardware to generate query results in real-time. A second approach is to capture events data, partition it on a dimension other than time, and then use materialized views to aggregate that data on the time dimension.

Another approach, one that's cost-efficient when more than 5% of internet's traffic is flowing through your infrastructure, is using Kafka queues and Go aggregators to capture 1-minute data in your scale-out PostgreSQL cluster. You can then aggregate that data further within the database.

Cloudflare Log Processing Infrastructure

The diagram above highlights the last approach, with specific details from Cloudflare's architecture. In practice, our customers who have real-time analytic workloads could be using any one of the three approaches above -- thanks to PostgreSQL's flexible architecture!

Also, we're currently revamping our documentation to include these three example architectures for real-time analytics. In the meantime, if you have any questions about using Citus, please get in touch with us!

Written by Ozgun Erdogan

Co-founder & CTO of Citus Data. Former Postgres engineering team director at Microsoft. Worked on distributed systems at Amazon. Speaker at PGCon, XLDB Conf, DataEngConf, PostgresOpen, & QCon. Dad.

SHARE THIS POST

Related Content