Multiple Node Cluster

CitusDB's single and multiple node clusters execute the exact same logic. They both run independent master and worker databases that communicate over TCP/IP; and they primarily differ in that the single node cluster is bound by a single node's hardware resources. To overcome this limitation, CitusDB needs to installed using a distributed setup.

This distributed setup involves two steps that differ from those of a single node cluster. First, you need to configure authentication settings on the nodes to allow for them to talk to each other. Second, you need to either manually log in to all the nodes and issue commands, or rely on tools such as pssh to run the remote commands in parallel for you.

In the following, we assume that you will manually log in to all the nodes and issue commands. We also assume that we have one master and two worker nodes in our distributed setup, and refer to these nodes as master-100, worker-101, and worker-102. We additionally use the notation all-nodes when referring to commands that need to be repeated across all nodes in the cluster. Lastly, we note that you can edit the hosts file in /etc/hosts if your nodes don't already have their DNS names assigned.

Linux Nodes

As with the single node setup, we start by downloading and installing the Citus DB package. For example, if you are running Ubuntu 10.04+ on a 64-bit machine, you need to run the following commands on all the nodes in the cluster:

all-nodes# wget
all-nodes# sudo dpkg --install citusdb-4.0.0-1.amd64.deb

We now need to configure connection and authentication settings for installed databases. For this, you first instruct Citus DB to listen on all IP interfaces, and then configure the client authentication file to allow all incoming connections from the local network. Admittedly, these settings are too permissive for certain types of environments, and the PostgreSQL manual explains in more detail on how to restrict them further.

all-nodes# emacs -nw /opt/citusdb/4.0/data/postgresql.conf

# Uncomment listen_addresses for the changes to take effect
listen_addresses = '*'

all-nodes# emacs -nw /opt/citusdb/4.0/data/pg_hba.conf

# Allow unrestricted access to nodes in the local network. The following ranges
# correspond to 24, 20, and 16-bit blocks in Private IPv4 address spaces.
host    all             all               trust

After changing connection settings, you simply need to tell the master node about the workers by editing the master node's membership file. For this, you append each worker node's DNS name to the file.

master-100# emacs -nw /opt/citusdb/4.0/data/pg_worker_list.conf

# HOSTNAME     [PORT]     [RACK]

You are now ready to start up all databases in the cluster.

all-nodes# /opt/citusdb/4.0/bin/pg_ctl -D /opt/citusdb/4.0/data -l logfile start

After all databases start up, you can then connect to the master node and issue queries against the distributed cluster. We cover two such examples in the previous section. These examples are also applicable here.

EC2 Multiple Node Cluster

Citus DB also has a publicly available machine image on EC2. As with other Amazon images, you launch this instance through the AWS Management Console. Once you sign in to this console, you will need to choose ami-48062920 as your machine image, and start up several instances based on this image. In our examples, we typically use three m3.2xlarge instances, but you're also welcome to try other configurations.

Besides the management console, you can also start up instances using the Amazon API tools. This approach requires more effort, but we still give instructions to launch instances this way for users who prefer to only use the shell. For those users, we assume that you already have the API tools installed and have your security credentials generated. If not, Ubuntu's EC2 Starters Guide provides great instructions on how to get started.

localhost# ec2-authorize default -p 22
localhost# ec2-run-instances ami-48062920 -n 3 -t m3.2xlarge -g default -k <ec2_keypair>
localhost# ec2-describe-instances

These commands first authorize ssh login access for the default security group, and then launch three Citus DB nodes. When starting up, these nodes mount one ephemeral disk at the /opt/citusdb mount point, and then install the database at /opt/citusdb/4.0/data. The nodes also have their postgresql.conf and pg_hba.conf edited so that the databases can talk to each other over TCP.

Once these instances come up, you can pick one node as the master, and log in to it. You then need to edit the membership file on this node, and tell it about the worker nodes in the cluster. For this, you append other instance's private DNS names to the membership file. From there on, we refer to this node as master-100.

localhost# ssh -i <private SSH key file> ec2-user@<ec2 master hostname>
master-100# emacs -nw /opt/citusdb/4.0/data/pg_worker_list.conf

# HOSTNAME     [PORT]     [RACK]

You are now ready to start up all databases in the cluster. You can either log in to these instances and issue commands to start up databases, or you can use tools such as pssh to run these commands in parallel for you.

all-nodes# /opt/citusdb/4.0/bin/pg_ctl -D /opt/citusdb/4.0/data -l logfile start

After all databases start up, you can then connect to the master node and issue queries against the distributed cluster. We cover two such examples in the previous section. These examples are also applicable here.

CloudFormation template

To simplify the process of setting up a CitusDB cluster on EC2, AWS users can also use AWS CloudFormation. The CloudFormation template for CitusDB on the Downloads page will let you start a CitusDB cluster on AWS in just a few clicks, with pg_shard, for real-time transactional workloads, cstore_fdw, for columnar storage, and contrib extensions pre-installed. You do not need to follow any of the configuration steps above when using the CloudFormation template.

CloudFormation lets you create a "stack" of AWS resources, such as EC2 instances and security groups, from a template defined in a JSON file. You can create multiple stacks from the same template without conflict, as long as they have unique stack names.

To begin, start a CitusDB cluster using CloudFormation via the Downloads page, which will take you directly to the AWS CloudFormation console. In the console, pick a unique name for your stack:

Enter a name for your stack

Next, set the your key pair name. If you don't have one for the region in which you're launching the cluster, then you can create it in the EC2 console. You can also set the number of workers, the instance types and the availability zone.

Enter your EC2 key pair name

Finally, acknowledge the IAM capabilities, which give the master node limited access to the EC2 APIs to obtain the list of worker IPs. Your AWS account needs to have IAM access to perform this step.

Acknowledge IAM permissions and create the template.

After about 10 minutes, stack creation completes and the hostname of the master node will appear in the Outputs tab. You can immediately connect to it using SSH with username ec2-user and EC2 keypair you filled in. If something goes wrong during set-up, the stack will be rolled back but not deleted. In that case, either use a different stack name or delete the old stack before creating a new one with the same name.

Stack creation completed

To get started using CitusDB, you can follow the examples. Note that the database and its configuration files are stored in /data/base. The template automatically tunes the system configuration for CitusDB and sets up RAID on the SSD drives where appropriate, making it a great starting point even for production systems.

You typically want to avoid making changes to resources created by CloudFormation, such as terminating EC2 instances. To shut the cluster down, simply delete the stack in the CloudFormation console.