Deploying a cluster of PostgreSQL servers can become a lot easier with Docker

Marco Slot | 17th September, 2015

Docker is quickly becoming one of the most popular ways of deploying distributed applications. By bundling all dependencies of an application in an easily shippable container, software deployment becomes a process that can be performed quickly and often. The open source sharding extension for PostgreSQL, pg_shard, and scalable real-time analytics solution for PostgreSQL, CitusDB, are both meant to run on a cluster of PostgreSQL servers. Deploying such a cluster can become a lot easier with Docker.

While there is no officially supported Docker container for Citus Data extensions at the moment, we were very excited to learn that Heap has published citus-docker. Heap uses CitusDB to analyze their click stream data in real-time with some advanced funnel queries. Their Docker image comes with both pg_shard and CitusDB pre-installed. If you haven't used docker before, you can follow the installation guide to set it up.

One of the benefits of Docker is that it lets you set up a whole cluster on your machine for testing very easily. pg_shard users often set up a cluster on their desktop, running multiple postgres servers on different ports. This approach is not very practical since it requires you to go through all the configuration steps multiple times. With docker-compose, setting up a local cluster becomes a breeze.

If you haven't done so already, you can install docker-compose with the following command:

sudo su
curl -L https://github.com/docker/compose/releases/download/1.3.3/docker-compose-`uname -s`-`uname -m` > /usr/bin/docker-compose
chmod +x /usr/bin/docker-compose

Now run the following:

git clone https://github.com/heap/citus-docker
cd citus-docker
docker-compose up -d

That's it! No additional configuration required. You now have a local pg_shard cluster with 2 worker nodes and a master node to which you can connect to using: psql -h localhost -U postgres. When you are done, you can remove it using:

docker-compose kill
docker-compose rm

You could also start individual nodes by running the docker command on every node in the cluster:

docker run -d -p 5432:5432 --name citusdocker heap/citus-docker​

On the master node, you will want to configure pg_worker_list.conf and add pg_shard to shared_preload_libraries per the instructions on the pg_shard github page:

 docker exec -it citusdocker bash
cat > /data/pg_worker_list.conf <<WORKERS
# Enter IP addresses and ports of worker nodes
10.61.164.42 5432
...
WORKERS
psql -c "ALTER SYSTEM SET shared_preload_libraries TO 'pg_shard'"
exit

After changing this configuration, you should restart your container:

docker restart citusdocker

You can now connect to the docker container running on the master node using psql -h localhost -U postgres and follow the table sharding examples. If you plan to use docker to operate a cluster, we recommend looking into persistent storage options for docker and creating a new image.

When you would like to get rid of your docker container, run the following commands:

docker kill citusdocker # Stop the container
docker rm citusdocker # Delete the container

We hope this gives a good starting point for using pg_shard and CitusDB with docker. Special thanks to Heap for providing the image! We'll also look to provide an official docker image for CitusDB in the upcoming months.