Before You Go To Production

We cover three interesting points about Citus DB's behavior in this section. First, when the user modifies a table's schema, the schema modification takes effect only for new shards. That is, the stage command uses the modified schema during new shard uploads, but the changes do not get propagated to old shards that have already been uploaded to the cluster.

Second, when the user drops a table, Citus DB only removes table's metadata, but doesn't issue commands to remove the table's shards from the worker nodes. If disk space becomes an issue, users need to manually clean up the old shards in the database.

Third, when the user issues an analytics query and the query takes a long time, the best way to debug the issue is through enabling statement logging on one of the worker nodes.

worker-101# emacs -nw /opt/citusdb/4.0/data/postgresql.conf

log_statement = 'all'

Citus DB partitions an incoming query into smaller queries, opens connections for these smaller queries, and sends the queries to worker nodes for execution. As a result, an incoming query's execution time is typically the time it takes to open connections to worker nodes plus the time it takes to execute queries. The connection set up times are constant and on most networks don't take longer than a second in total.

Query execution times on the other hand vary significantly depending on the nature of the query. The log_statement configuration value enables logging of all small queries executed on the worker node, and allows users to observe query run times. If certain small queries are slow, users can connect to the local worker node with psql, and run EXPLAIN ANALYZE on these queries to analyze their run times.