By clicking download below, you agree that you have read, understand and accept the CitusDB License Agreement.
The Citus DB installer puts all binaries under /opt/citusdb/2.0, and also creates a data subdirectory to store the newly initialized database's contents. The installer then sets the data directory's owner to the current real user. To install, run the following distribution specific commands.Ubuntu / Debian
localhost# sudo dpkg --install citusdb-2.0.0-1.amd64.debFedora / Redhat
localhost# sudo rpm --install citusdb-2.0.0-1.x86_64.rpmAmazon EC2
You can use the AWS Management Console or the ec2-run-instances command to launch ami-cbd741a2. In here, we start up a single node; and we talk about launching multiple nodes later in our documentation.
localhost# ssh -i <private SSH key file> ec2-user@<external hostname>
In this guide, we demonstrate a setup that uses multiple database instances on the same node. We use the already installed database as the master, and then initialize two more worker nodes. For a setup with independent worker nodes, please see the documentation page.
localhost# /opt/citusdb/2.0/bin/initdb -D /opt/citusdb/2.0/data.9700 localhost# /opt/citusdb/2.0/bin/initdb -D /opt/citusdb/2.0/data.9701
We now need to tell the master database about the workers. To do this we append the worker database names to the pg_worker_list file. Here we can also specify the port number on which the workers are listening.
localhost# emacs -nw /opt/citusdb/2.0/data/pg_worker_list.conf # HOSTNAME [PORT] [RACK] localhost 9700 localhost 9701
Now we can start the databases using pg_ctl, specifying a data directory and a logfile for each database. We start the master database on the default port, and specify ports for the two workers.
localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data -l logfile start localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data.9700 -o "-p 9700" -l logfile.9700 start localhost# /opt/citusdb/2.0/bin/pg_ctl -D /opt/citusdb/2.0/data.9701 -o "-p 9701" -l logfile.9701 start
To try things out, we first need to download some example data.
localhost# wget http://examples.citusdata.com/customer_reviews_1998.csv.gz localhost# gzip -d customer_reviews_1998.csv.gz
We then use psql to connect to the master, specifying localhost and the default 'postgres' database.
localhost# /opt/citusdb/2.0/bin/psql -h localhost -d postgres
We can now create a distributed table. We partition the table on the review_date column by specifying the DISTRIBUTE BY APPEND clause.
postgres=# CREATE TABLE customer_reviews ( customer_id TEXT not null, review_date DATE not null, review_rating INTEGER not null, review_votes INTEGER, review_helpful_votes INTEGER, product_id CHAR(10) not null, product_title TEXT not null, product_sales_rank BIGINT, product_group TEXT, product_category TEXT, product_subcategory TEXT, similar_product_ids CHAR(10) ) DISTRIBUTE BY APPEND (review_date);
We next load data using the STAGE command; this command has the same syntax as PostgreSQL's COPY. Citus DB automatically partitions the data into fixed-size blocks, and replicates these blocks among worker databases.
postgres=# \STAGE customer_reviews FROM '/home/user/customer_reviews_1998.csv' (FORMAT CSV)
We are now ready to start issuing queries against the cluster. For additional example queries, please see the documentation page.
postgres=# SELECT count(*) FROM customer_reviews;