PGConf.Russia talk on pg_shard

Written by Marco Slot
March 23, 2015

Last month we went to PGConf.Russia and gave a talk on [pg_shard](/citus-products/pg-shard) which is available for download on GitHub. pg_shard is an open source PostgreSQL extension which allows users to scale out their database by creating a cluster of commodity servers. pg_shard shards the database and replicates the shards across the cluster, allowing users to both easily scale out as well as create a high availability cluster. The video from my talk is now available for all to see:

We got some very interesting questions during the talk that we wanted to highlight and clarify.

  • Does pg_shard/CitusDB run my queries in parallel?
    In pg_shard, a query will use one thread on a master node and one on a worker node. You can run many queries in parallel by making multiple connections to the master node(s), whereas the real work is being done by the worker nodes. UPDATE and DELETE queries on the same shard are serialized to ensure consistency between the replicas. To parallelize multi-shard SELECT queries across the worker nodes, you can upgrade to CitusDB.
  • Can I use stored procedures in my queries?
    Yes, and this is a powerful feature of pg_shard. Function calls in queries are executed on the worker nodes, which allows you to include arbitrary logic in your queries and scale it out to many worker nodes.
  • How do I ALTER a distributed table?
    pg_shard currently does not automatically propagate ALTER TABLE commands on a distributed table to the individual shards on the workers, but you can easily do this with a simple shell script. For pg_shard: alter-pgshard-table.sh and for CitusDB: alter-citusdb-table.sh.
  • What kind of lock is used when copying shard placements?
    In the latest version of pg_shard, we've added a master_copy_shard_placement function which takes an exclusive lock of the shard. This will temporarily block changes to the shard, while selects can still go through.
  • What's the difference between pg_shard and PL/Proxy?
    Pl/Proxy allows you to scale out stored procedures on a master node across many worker nodes, whereas pg_shard allows you to (transparently) scale out a table and queries on the table across many worker nodes using replication and sharding.
  • Can I use cstore_fdw and pg_shard without CitusDB?
    You certainly can! cstore_fdw and pg_shard can be used both in a regular PostgreSQL database or in combination with CitusDB.

We would like to thank the organizers for a great conference and providing the recording of the talk! If you would like more information on pg_shard or on common use cases for scaling out PostgreSQL, please visit the Citus Data website.

Marco Slot

Written by Marco Slot

Former lead engineer for the Citus database engine at Microsoft. Speaker at Postgres Conf EU, PostgresOpen, pgDay Paris, Hello World, SIGMOD, & lots of meetups. Talk selection team member for Citus Con: An Event for Postgres. PhD in distributed systems. Loves mountain hiking.

@marcoslot marcocitus