Using Docker to run a pg_shard cluster

Written by Marco Slot
September 17, 2015

Update: As of Citus 5.0, Citus is now open source, and all pg_shard functionality is included directly in the Citus extension to Postgres. We encourage you to take a look at Citus instead. You can download Citus open source packages for single-machine clusters or multi-machine clusters, for all the major operating systems, and of course via Docker too. Or you can try Citus in the cloud on Microsoft Azure, available as Azure Cosmos DB for PostgreSQL. Learn more about Citus on Azure.

Docker is quickly becoming one of the most popular ways of deploying distributed applications. By bundling all dependencies of an application in an easily shippable container, software deployment becomes a process that can be performed quickly and often. The open source sharding extension for PostgreSQL, pg_shard, and scalable real-time analytics solution for PostgreSQL, CitusDB, are both meant to run on a cluster of PostgreSQL servers. Deploying such a cluster can become a lot easier with Docker.

While there is no officially supported Docker container for Citus Data extensions at the moment, we were very excited to learn that Heap has published citus-docker. Heap uses CitusDB to analyze their click stream data in real-time with some advanced funnel queries. Their Docker image comes with both pg_shard and CitusDB pre-installed. If you haven't used docker before, you can follow the installation guide to set it up.

One of the benefits of Docker is that it lets you set up a whole cluster on your machine for testing very easily. pg_shard users often set up a cluster on their desktop, running multiple postgres servers on different ports. This approach is not very practical since it requires you to go through all the configuration steps multiple times. With docker-compose, setting up a local cluster becomes a breeze.

If you haven't done so already, you can install docker-compose with the following command:

sudo su
curl -L https://github.com/docker/compose/releases/download/1.3.3/docker-compose-`uname -s`-`uname -m` > /usr/bin/docker-compose
chmod +x /usr/bin/docker-compose

Now run the following:

git clone https://github.com/heap/citus-docker
cd citus-docker
docker-compose up -d

That's it! No additional configuration required. You now have a local pg_shard cluster with 2 worker nodes and a master node to which you can connect to using: psql -h localhost -U postgres. When you are done, you can remove it using:

docker-compose kill
docker-compose rm

You could also start individual nodes by running the docker command on every node in the cluster:

docker run -d -p 5432:5432 --name citusdocker heap/citus-docker​

On the master node, you will want to configure pg_worker_list.conf and add pg_shard to shared_preload_libraries per the instructions on the pg_shard github page:

 docker exec -it citusdocker bash
cat > /data/pg_worker_list.conf <<WORKERS
# Enter IP addresses and ports of worker nodes
10.61.164.42 5432
...
WORKERS
psql -c "ALTER SYSTEM SET shared_preload_libraries TO 'pg_shard'"
exit

After changing this configuration, you should restart your container:

docker restart citusdocker

You can now connect to the docker container running on the master node using psql -h localhost -U postgres and follow the table sharding examples. If you plan to use docker to operate a cluster, we recommend looking into persistent storage options for docker and creating a new image.

When you would like to get rid of your docker container, run the following commands:

docker kill citusdocker # Stop the container
docker rm citusdocker # Delete the container

We hope this gives a good starting point for using pg_shard and CitusDB with docker. Special thanks to Heap for providing the image! We'll also look to provide an official docker image for CitusDB in the upcoming months.

Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus