Citus Data Blog - Articles by Eren Basak
Eren Basak
Scaling data and analytics with Postgres
https://www.citusdata.com/blog/
2016-10-10T00:00:00+00:00
How Distributed Outer Joins on PostgreSQL with Citus Work
https://www.citusdata.com/blog/2016/10/10/outer-joins-in-citus/
2016-10-10T00:00:00+00:00
2016-10-10T00:00:00+00:00
Eren Basak
<p>SQL is a very powerful language for analyzing and reporting against data. At the core of SQL is the idea of joins and how you combine various tables together. One such type of join: outer joins are useful when we need to retain rows, even if it has no match on the other side.</p>
<p>And while the most common type of join, inner join, against tables A and B would bring only the tuples that have a match for both A and B, outer joins give us the ability to bring together from say all of table A even if they don’t have a corresponding match in table B. For example, let's say you keep customers in one table and purchases in another table. When you want to see all purchases of customers, you may want to see all customers in the result even if they did not do any purchases yet. Then, you need an outer join. Within this post we’ll analyze a bit on what outer joins are, and then how we support them in a distributed fashion on Citus. </p>
<p>Let’s say we have two tables, customer and purchase:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="n">customer</span> <span class="k">table</span><span class="p">:</span>
<span class="n">customer_id</span> <span class="o">|</span> <span class="n">name</span>
<span class="c1">-------------+-----------------</span>
<span class="mi">1</span> <span class="o">|</span> <span class="n">Corra</span> <span class="n">Ignacio</span>
<span class="mi">3</span> <span class="o">|</span> <span class="n">Warren</span> <span class="n">Brooklyn</span>
<span class="mi">2</span> <span class="o">|</span> <span class="n">Jalda</span> <span class="n">Francis</span>
<span class="n">purchase</span> <span class="k">table</span><span class="p">:</span>
<span class="n">purchase_id</span> <span class="o">|</span> <span class="n">customer_id</span> <span class="o">|</span> <span class="n">category</span> <span class="o">|</span> <span class="k">comment</span>
<span class="c1">-------------+-------------+----------+------------------------------</span>
<span class="mi">1000</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">books</span> <span class="o">|</span> <span class="n">Nice</span> <span class="k">to</span> <span class="n">Have</span><span class="o">!</span>
<span class="mi">1001</span> <span class="o">|</span> <span class="mi">1</span> <span class="o">|</span> <span class="n">chairs</span> <span class="o">|</span> <span class="n">Comfortable</span>
<span class="mi">1002</span> <span class="o">|</span> <span class="mi">2</span> <span class="o">|</span> <span class="n">books</span> <span class="o">|</span> <span class="n">Good</span> <span class="k">Read</span><span class="p">,</span> <span class="n">cheap</span> <span class="n">price</span>
<span class="mi">1003</span> <span class="o">|</span> <span class="o">-</span><span class="mi">1</span> <span class="o">|</span> <span class="n">hardware</span> <span class="o">|</span> <span class="k">Not</span> <span class="n">very</span> <span class="n">cheap</span>
<span class="mi">1004</span> <span class="o">|</span> <span class="o">-</span><span class="mi">1</span> <span class="o">|</span> <span class="n">laptops</span> <span class="o">|</span> <span class="n">Good</span> <span class="n">laptop</span> <span class="n">but</span> <span class="n">expensive</span><span class="p">...</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="customer table:
customer_id | name
-------------+-----------------
1 | Corra Ignacio
3 | Warren Brooklyn
2 | Jalda Francis
purchase table:
purchase_id | customer_id | category | comment
-------------+-------------+----------+------------------------------
1000 | 1 | books | Nice to Have!
1001 | 1 | chairs | Comfortable
1002 | 2 | books | Good Read, cheap price
1003 | -1 | hardware | Not very cheap
1004 | -1 | laptops | Good laptop but expensive...
">Copy</button>
</div>
<p>The following queries and results help clarifying the inner and outer join behaviors:</p>
<div class="highlight">
<pre class="highlight sql"><code> <span class="k">SELECT</span> <span class="n">customer</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span>
<span class="k">FROM</span> <span class="n">customer</span> <span class="k">JOIN</span> <span class="n">purchase</span> <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">purchase</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span><span class="p">;</span>
<span class="n">name</span> <span class="o">|</span> <span class="k">comment</span>
<span class="c1">---------------+------------------------</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Comfortable</span>
<span class="n">Jalda</span> <span class="n">Francis</span> <span class="o">|</span> <span class="n">Good</span> <span class="k">Read</span><span class="p">,</span> <span class="n">cheap</span> <span class="n">price</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Nice</span> <span class="k">to</span> <span class="n">Have</span><span class="o">!</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text=" SELECT customer.name, purchase.comment
FROM customer JOIN purchase ON customer.customer_id = purchase.customer_id
ORDER BY purchase.comment;
name | comment
---------------+------------------------
Corra Ignacio | Comfortable
Jalda Francis | Good Read, cheap price
Corra Ignacio | Nice to Have!
">Copy</button>
</div>
<p><img src="/assets/images/blog/_BLOG__Distributed_Outer_Joins_-_Google_Docs.png" style="float:right;width:24%;"></p>
<div class="highlight">
<pre class="highlight sql"><code> <span class="k">SELECT</span> <span class="n">customer</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span>
<span class="k">FROM</span> <span class="n">customer</span> <span class="k">INNER</span> <span class="k">JOIN</span> <span class="n">purchase</span> <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">purchase</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span><span class="p">;</span>
<span class="n">name</span> <span class="o">|</span> <span class="k">comment</span>
<span class="c1">---------------+------------------------</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Comfortable</span>
<span class="n">Jalda</span> <span class="n">Francis</span> <span class="o">|</span> <span class="n">Good</span> <span class="k">Read</span><span class="p">,</span> <span class="n">cheap</span> <span class="n">price</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Nice</span> <span class="k">to</span> <span class="n">Have</span><span class="o">!</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text=" SELECT customer.name, purchase.comment
FROM customer INNER JOIN purchase ON customer.customer_id = purchase.customer_id
ORDER BY purchase.comment;
name | comment
---------------+------------------------
Corra Ignacio | Comfortable
Jalda Francis | Good Read, cheap price
Corra Ignacio | Nice to Have!
">Copy</button>
</div>
<p><img src="/assets/images/blog/_BLOG__Distributed_Outer_Joins_-_Google_Docs2.png" style="float:right;width:24%;"></p>
<div class="highlight">
<pre class="highlight sql"><code> <span class="k">SELECT</span> <span class="n">customer</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span>
<span class="k">FROM</span> <span class="n">customer</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">purchase</span> <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">purchase</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span><span class="p">;</span>
<span class="n">name</span> <span class="o">|</span> <span class="k">comment</span>
<span class="c1">-----------------+------------------------</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Comfortable</span>
<span class="n">Jalda</span> <span class="n">Francis</span> <span class="o">|</span> <span class="n">Good</span> <span class="k">Read</span><span class="p">,</span> <span class="n">cheap</span> <span class="n">price</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Nice</span> <span class="k">to</span> <span class="n">Have</span><span class="o">!</span>
<span class="n">Warren</span> <span class="n">Brooklyn</span> <span class="o">|</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text=" SELECT customer.name, purchase.comment
FROM customer LEFT JOIN purchase ON customer.customer_id = purchase.customer_id
ORDER BY purchase.comment;
name | comment
-----------------+------------------------
Corra Ignacio | Comfortable
Jalda Francis | Good Read, cheap price
Corra Ignacio | Nice to Have!
Warren Brooklyn |
">Copy</button>
</div>
<p><img src="/assets/images/blog/_BLOG__Distributed_Outer_Joins_-_Google_Docs3.png" style="float:right;width:24%;"></p>
<div class="highlight">
<pre class="highlight "><code> SELECT customer.name, purchase.comment
FROM customer RIGHT JOIN purchase ON customer.customer_id = purchase.customer_id
ORDER BY purchase.comment;
name | comment
---------------+------------------------------
Corra Ignacio | Comfortable
Jalda Francis | Good Read, cheap price
| Good laptop but expensive...
Corra Ignacio | Nice to Have!
| Not very cheap
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text=" SELECT customer.name, purchase.comment
FROM customer RIGHT JOIN purchase ON customer.customer_id = purchase.customer_id
ORDER BY purchase.comment;
name | comment
---------------+------------------------------
Corra Ignacio | Comfortable
Jalda Francis | Good Read, cheap price
| Good laptop but expensive...
Corra Ignacio | Nice to Have!
| Not very cheap
">Copy</button>
</div>
<p><img src="/assets/images/blog/_BLOG__Distributed_Outer_Joins_-_Google_Docs4.png" style="float:right;width:24%;"></p>
<div class="highlight">
<pre class="highlight sql"><code> <span class="k">SELECT</span> <span class="n">customer</span><span class="p">.</span><span class="n">name</span><span class="p">,</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span>
<span class="k">FROM</span> <span class="n">customer</span> <span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">purchase</span> <span class="k">ON</span> <span class="n">customer</span><span class="p">.</span><span class="n">customer_id</span> <span class="o">=</span> <span class="n">purchase</span><span class="p">.</span><span class="n">customer_id</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">purchase</span><span class="p">.</span><span class="k">comment</span><span class="p">;</span>
<span class="n">name</span> <span class="o">|</span> <span class="k">comment</span>
<span class="c1">-----------------+------------------------------</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Comfortable</span>
<span class="n">Jalda</span> <span class="n">Francis</span> <span class="o">|</span> <span class="n">Good</span> <span class="k">Read</span><span class="p">,</span> <span class="n">cheap</span> <span class="n">price</span>
<span class="o">|</span> <span class="n">Good</span> <span class="n">laptop</span> <span class="n">but</span> <span class="n">expensive</span><span class="p">...</span>
<span class="n">Corra</span> <span class="n">Ignacio</span> <span class="o">|</span> <span class="n">Nice</span> <span class="k">to</span> <span class="n">Have</span><span class="o">!</span>
<span class="o">|</span> <span class="k">Not</span> <span class="n">very</span> <span class="n">cheap</span>
<span class="n">Warren</span> <span class="n">Brooklyn</span> <span class="o">|</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text=" SELECT customer.name, purchase.comment
FROM customer FULL JOIN purchase ON customer.customer_id = purchase.customer_id
ORDER BY purchase.comment;
name | comment
-----------------+------------------------------
Corra Ignacio | Comfortable
Jalda Francis | Good Read, cheap price
| Good laptop but expensive...
Corra Ignacio | Nice to Have!
| Not very cheap
Warren Brooklyn |
">Copy</button>
</div>
<h3>Distributed Outer Joins with Citus</h3>
<p>The Citus extension allows PostgreSQL to distribute big tables into smaller fragments called <a href="http://www.craigkerstiens.com/2012/11/30/sharding-your-database/">“shards”</a> and performing outer joins on these distributed tables becomes a bit more challenging, since the union of outer joins between individual shards does not always give the correct result. Currently, Citus support distributed outer joins under some criteria:
- Outer joins should be between distributed(sharded) tables only, i.e. it is not possible to outer join a sharded table with a regular PostgreSQL table.
- Join criteria should be on <a href="https://docs.citusdata.com/en/v5.2/dist_tables/concepts.html">partition columns</a> of the distributed tables.
- The query should join the distributed tables on the equality of partition columns (table1.a = table2.a)
- Shards of the distributed table should match one to one, i.e. each shard of table A should overlap with one and only one shard from table B.</p>
<p>For example lets assume we 3 hash distributed tables X, Y and Z and let X and Y have 4 shards while Z has 8 shards.</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="k">user</span> <span class="p">(</span><span class="n">user_id</span> <span class="nb">int</span><span class="p">,</span> <span class="n">name</span> <span class="nb">text</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_distributed_table</span><span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="s1">'user_id'</span><span class="p">,</span> <span class="s1">'hash'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_worker_shards</span><span class="p">(</span><span class="s1">'user'</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">purchase</span> <span class="p">(</span><span class="n">user_id</span> <span class="nb">int</span><span class="p">,</span> <span class="n">amount</span> <span class="nb">int</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_distributed_table</span><span class="p">(</span><span class="s1">'purchase'</span><span class="p">,</span> <span class="s1">'user_id'</span><span class="p">,</span> <span class="s1">'hash'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_worker_shards</span><span class="p">(</span><span class="s1">'purchase'</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="k">comment</span> <span class="p">(</span><span class="n">user_id</span> <span class="nb">int</span><span class="p">,</span> <span class="k">comment</span> <span class="nb">text</span><span class="p">,</span> <span class="n">rating</span> <span class="nb">int</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_distributed_table</span><span class="p">(</span><span class="s1">'comment'</span><span class="p">,</span> <span class="s1">'user_id'</span><span class="p">,</span> <span class="s1">'hash'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_worker_shards</span><span class="p">(</span><span class="s1">'comment'</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="CREATE TABLE user (user_id int, name text);
SELECT master_create_distributed_table('user', 'user_id', 'hash');
SELECT master_create_worker_shards('user', 4, 1);
CREATE TABLE purchase (user_id int, amount int);
SELECT master_create_distributed_table('purchase', 'user_id', 'hash');
SELECT master_create_worker_shards('purchase', 4, 1);
CREATE TABLE comment (user_id int, comment text, rating int);
SELECT master_create_distributed_table('comment', 'user_id', 'hash');
SELECT master_create_worker_shards('comment', 8, 1);
">Copy</button>
</div>
<p>The following query would work since distributed tables user and purchase have the same number of shards and the join criteria is equality of partition columns:
<code>sql
SELECT * FROM user OUTER JOIN purchase ON user.user_id = purchase.user_id;
</code></p>
<p>The following queries are not supported out of the box:
```sql
-- user and comment tables doesn’t have the same number of shards:
SELECT * FROM user OUTER JOIN comment ON user.user<em>id = comment.user</em>id;</p>
<p>-- join condition is not on the partition columns:
SELECT * FROM user OUTER JOIN purchase ON user.user_id = purchase.amount;</p>
<p>-- join condition is not equality:
SELECT * FROM user OUTER JOIN purchase ON user.user<em>id < purchase.user</em>id;
```</p>
<p>How Citus Processes OUTER JOINs
When one-to-one matching between shards exists, then performing an outer join on large tables is equivalent to combining outer join results of corresponding shards.</p>
<p><img src="/assets/images/blog/_BLOG__Distributed_Outer_Joins_-_Google_Docs5.png" alt="Distributed outer join example"></p>
<p>Let’s look at how Citus handles an outer join query:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SELECT</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span><span class="p">,</span> <span class="n">table1</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b1</span><span class="p">,</span> <span class="n">table2</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b2</span><span class="p">,</span> <span class="n">table3</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b3</span><span class="p">,</span> <span class="n">table4</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b4</span>
<span class="k">FROM</span> <span class="n">table1</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table2</span> <span class="k">ON</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table2</span><span class="p">.</span><span class="n">a</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table3</span> <span class="k">ON</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table3</span><span class="p">.</span><span class="n">a</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table4</span> <span class="k">ON</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table4</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SELECT table1.a, table1.b AS b1, table2.b AS b2, table3.b AS b3, table4.b AS b4
FROM table1
FULL JOIN table2 ON table1.a = table2.a
FULL JOIN table3 ON table1.a = table3.a
FULL JOIN table4 ON table1.a = table4.a;
">Copy</button>
</div>
<p>First, the query goes through the standard PostgreSQL planner and Citus uses this plan to generate a distributed plan where various checks about Citus’ support of the query are performed. Then individual queries that will go to workers for distributed table fragments are generated.</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SELECT</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span><span class="p">,</span> <span class="n">table1</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b1</span><span class="p">,</span> <span class="n">table2</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b2</span><span class="p">,</span> <span class="n">table3</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b3</span><span class="p">,</span> <span class="n">table4</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b4</span>
<span class="k">FROM</span> <span class="p">(((</span><span class="n">table1_102359</span> <span class="n">table1</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table2_102363</span> <span class="n">table2</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table2</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table3_102367</span> <span class="n">table3</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table3</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table4_102371</span> <span class="n">table4</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table4</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span> <span class="k">WHERE</span> <span class="k">true</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SELECT table1.a, table1.b AS b1, table2.b AS b2, table3.b AS b3, table4.b AS b4
FROM (((table1_102359 table1
FULL JOIN table2_102363 table2 ON ((table1.a = table2.a)))
FULL JOIN table3_102367 table3 ON ((table1.a = table3.a)))
FULL JOIN table4_102371 table4 ON ((table1.a = table4.a))) WHERE true
">Copy</button>
</div>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SELECT</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span><span class="p">,</span> <span class="n">table1</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b1</span><span class="p">,</span> <span class="n">table2</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b2</span><span class="p">,</span> <span class="n">table3</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b3</span><span class="p">,</span> <span class="n">table4</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b4</span>
<span class="k">FROM</span> <span class="p">(((</span><span class="n">table1_102360</span> <span class="n">table1</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table2_102364</span> <span class="n">table2</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table2</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table3_102368</span> <span class="n">table3</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table3</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table4_102372</span> <span class="n">table4</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table4</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span> <span class="k">WHERE</span> <span class="k">true</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SELECT table1.a, table1.b AS b1, table2.b AS b2, table3.b AS b3, table4.b AS b4
FROM (((table1_102360 table1
FULL JOIN table2_102364 table2 ON ((table1.a = table2.a)))
FULL JOIN table3_102368 table3 ON ((table1.a = table3.a)))
FULL JOIN table4_102372 table4 ON ((table1.a = table4.a))) WHERE true
">Copy</button>
</div>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SELECT</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span><span class="p">,</span> <span class="n">table1</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b1</span><span class="p">,</span> <span class="n">table2</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b2</span><span class="p">,</span> <span class="n">table3</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b3</span><span class="p">,</span> <span class="n">table4</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b4</span>
<span class="k">FROM</span> <span class="p">(((</span><span class="n">table1_102361</span> <span class="n">table1</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table2_102365</span> <span class="n">table2</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table2</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table3_102369</span> <span class="n">table3</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table3</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table4_102373</span> <span class="n">table4</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table4</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span> <span class="k">WHERE</span> <span class="k">true</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SELECT table1.a, table1.b AS b1, table2.b AS b2, table3.b AS b3, table4.b AS b4
FROM (((table1_102361 table1
FULL JOIN table2_102365 table2 ON ((table1.a = table2.a)))
FULL JOIN table3_102369 table3 ON ((table1.a = table3.a)))
FULL JOIN table4_102373 table4 ON ((table1.a = table4.a))) WHERE true
">Copy</button>
</div>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SELECT</span> <span class="n">table1</span><span class="p">.</span><span class="n">a</span><span class="p">,</span> <span class="n">table1</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b1</span><span class="p">,</span> <span class="n">table2</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b2</span><span class="p">,</span> <span class="n">table3</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b3</span><span class="p">,</span> <span class="n">table4</span><span class="p">.</span><span class="n">b</span> <span class="k">AS</span> <span class="n">b4</span>
<span class="k">FROM</span> <span class="p">(((</span><span class="n">table1_102362</span> <span class="n">table1</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table2_102366</span> <span class="n">table2</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table2</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table3_102370</span> <span class="n">table3</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table3</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span>
<span class="k">FULL</span> <span class="k">JOIN</span> <span class="n">table4_102374</span> <span class="n">table4</span> <span class="k">ON</span> <span class="p">((</span><span class="n">table1</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">table4</span><span class="p">.</span><span class="n">a</span><span class="p">)))</span> <span class="k">WHERE</span> <span class="k">true</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SELECT table1.a, table1.b AS b1, table2.b AS b2, table3.b AS b3, table4.b AS b4
FROM (((table1_102362 table1
FULL JOIN table2_102366 table2 ON ((table1.a = table2.a)))
FULL JOIN table3_102370 table3 ON ((table1.a = table3.a)))
FULL JOIN table4_102374 table4 ON ((table1.a = table4.a))) WHERE true
">Copy</button>
</div>
<p>The resulting queries may seem complex at first but you can see that they are actually the same with the original query with just the table names are a bit different. This is because Citus stores the data in standard postgres tables called shards with the name as <table_name><em><shard</em>id>. With 1-1 matching of shards, the distributed outer join is equivalent to the union of all outer joins of individual matching shards. In many cases you don’t even have to think about this as Citus simply takes care of you. If you’re sharding on some shared id, as is common in certain <a href="https://www.citusdata.com/blog/2016/08/10/sharding-for-a-multi-tenant-app-with-postgres/">use cases</a>, then Citus will do the join on the appropriate node without any inter-worker communication.</p>
<p>We hope you found the insight into how we perform distributed outer joins valuable. If you’re curious about trying Citus or learning how more works we encourage you to join the conversation with us on <a href="https://slack.citusdata.com">Slack</a>.</p>
<p><em>This article was originally published on <a href='https://www.citusdata.com/blog/2016/10/10/outer-joins-in-citus/'>citusdata.com</a>.</em></p>
Scaling Out MySQL with PostgreSQL and Citus
https://www.citusdata.com/blog/2016/06/10/scaling-mysql-with-citus/
2016-06-10T00:00:00+00:00
2016-06-10T00:00:00+00:00
Eren Basak
<p>PostgreSQL is known for its <a href="http://www.postgresqlconference.org/sites/default/files/extensibility2.pdf">great extensibility</a> and powerful plugins. One particular category of extensions is <a href="https://wiki.postgresql.org/wiki/Foreign_data_wrappers">Foreign Data Wrappers</a> or FDWs. FDWs allow us to interact from within Postgres directly with other data stores such as <a href="https://github.com/EnterpriseDB/hdfs_fdw">hdfs</a>, <a href="https://github.com/citusdata/cstore_fdw">columnar stores</a>, <a href="https://github.com/EnterpriseDB/mysql_fdw">mysql</a>, etc. Combined with Citus’ scalability features, we can even leverage them to help us scale out those data stores where might otherwise be quite difficult. </p>
<p>Imagine having you have a very large and growing table in MySQL on which queries are taking and longer and longer. Fortunately, Citus can solve your problems with the help of mysql_fdw. Before we get to the meat of it, we would like to thank Eugen Konkov for his <a href="http://stackoverflow.com/questions/36882349/does-citus-support-creating-shards-using-mysql-fdw">interesting question on StackOverflow</a> and inspiring us. <em>Note that this tutorial is experimental work. Feel free to try this at home, but use caution before advancing it to production.</em></p>
<p>In this blog post we will see how PostgreSQL and Citus can help us to scale out existing MySQL data. For this we will do the following:</p>
<ol>
<li>Create a MySQL table and fill it with some data.</li>
<li>Partition the MySQL table into smaller MySQL tables.</li>
<li>Create a distributed table in PostgreSQL with Citus.</li>
<li>Connect the distributed table shards to the corresponding MySQL tables.</li>
<li>Run a query on the master Citus node and see that it correctly fetches the data from MySQL tables in parallel.</li>
</ol>
<p>The architecture that we will work on will look like this:</p>
<p><img src="/assets/images/blog/mysql_fdw_arch.png" alt="architecture"></p>
<p>First we assume that you have MySQL, PostgreSQL, Citus and mysql_fdw installed. For this demo, we will use LINEITEM table, from the standard TPC-H benchmarks. Let’s create it in MySQL and fill it with some data:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="n">mysql</span><span class="o">></span> <span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM</span> <span class="p">(</span> <span class="n">L_ORDERKEY</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_PARTKEY</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SUPPKEY</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_LINENUMBER</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_QUANTITY</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_EXTENDEDPRICE</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_DISCOUNT</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_TAX</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_RETURNFLAG</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_LINESTATUS</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SHIPDATE</span> <span class="nb">DATE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_COMMITDATE</span> <span class="nb">DATE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_RECEIPTDATE</span> <span class="nb">DATE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SHIPINSTRUCT</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SHIPMODE</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_COMMENT</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">44</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">);</span>
<span class="n">mysql</span><span class="o">></span> <span class="k">LOAD</span> <span class="k">DATA</span> <span class="k">LOCAL</span> <span class="n">INFILE</span> <span class="s1">'tpch_2_13_0/lineitem.tbl'</span> <span class="k">INTO</span> <span class="k">TABLE</span> <span class="n">LINEITEM</span> <span class="n">FIELDS</span> <span class="n">TERMINATED</span> <span class="k">BY</span> <span class="s1">'|'</span><span class="p">;</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="mysql> CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DOUBLE PRECISION NOT NULL,
L_EXTENDEDPRICE DOUBLE PRECISION NOT NULL,
L_DISCOUNT DOUBLE PRECISION NOT NULL,
L_TAX DOUBLE PRECISION NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL);
mysql> LOAD DATA LOCAL INFILE 'tpch_2_13_0/lineitem.tbl' INTO TABLE LINEITEM FIELDS TERMINATED BY '|';
">Copy</button>
</div>
<p>Note: To create lineitem.tbl, download the TPC-H bundle and use the dbgen tool. For this demo, we have generated the data with scale factor 10:</p>
<div class="highlight">
<pre class="highlight bash"><code>./dbgen <span class="nt">-f</span> <span class="nt">-s</span> 10 <span class="nt">-T</span> L
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="./dbgen -f -s 10 -T L
">Copy</button>
</div>
<p>Now let’s get some information about the data we have:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="n">mysql</span><span class="o">></span> <span class="k">SELECT</span> <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">LINEITEM</span><span class="p">;</span>
<span class="o">+</span><span class="c1">----------+</span>
<span class="o">|</span> <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">----------+</span>
<span class="o">|</span> <span class="mi">59986052</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">----------+</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="mysql> SELECT count(*) FROM LINEITEM;
+----------+
| count(*) |
+----------+
| 59986052 |
+----------+
">Copy</button>
</div>
<p>We now have about 60 million tuples and partitioning this data according to shipdate ranges sounds nice. Let’s see which dates should we pick as range limits with the following simple cumulative counting query:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SET</span> <span class="o">@</span><span class="n">running_total</span><span class="p">:</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span>
<span class="k">SELECT</span>
<span class="n">totals</span><span class="p">.</span><span class="n">shipdate</span> <span class="k">as</span> <span class="n">shipdate</span><span class="p">,</span>
<span class="p">(</span><span class="o">@</span><span class="n">running_total</span> <span class="p">:</span><span class="o">=</span> <span class="o">@</span><span class="n">running_total</span> <span class="o">+</span> <span class="n">totals</span><span class="p">.</span><span class="n">shipcount</span><span class="p">)</span> <span class="k">as</span> <span class="n">cumulative_count</span>
<span class="k">FROM</span>
<span class="p">(</span><span class="k">SELECT</span>
<span class="n">l_shipdate</span> <span class="k">AS</span> <span class="n">shipdate</span><span class="p">,</span>
<span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="n">shipcount</span>
<span class="k">FROM</span> <span class="n">LINEITEM</span>
<span class="k">GROUP</span> <span class="k">BY</span>
<span class="n">shipdate</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">shipdate</span><span class="p">)</span> <span class="k">AS</span> <span class="n">totals</span><span class="p">;</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SET @running_total:=0;
SELECT
totals.shipdate as shipdate,
(@running_total := @running_total + totals.shipcount) as cumulative_count
FROM
(SELECT
l_shipdate AS shipdate,
count(*) AS shipcount
FROM LINEITEM
GROUP BY
shipdate
ORDER BY shipdate) AS totals;
">Copy</button>
</div>
<p>From the query, we can pick 1993-04-07, 1994-05-13, 1995-06-18, 1996-07-23, 1997-08-28. Let’s create our partitions:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="err">$</span> <span class="n">mysql</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM_1</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">l_shipdate</span> <span class="o"><</span> <span class="s1">'1993-04-07'</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM_2</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">l_shipdate</span> <span class="o">>=</span> <span class="s1">'1993-04-07'</span> <span class="k">AND</span> <span class="n">l_shipdate</span> <span class="o"><</span> <span class="s1">'1994-05-13'</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM_3</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">l_shipdate</span> <span class="o">>=</span> <span class="s1">'1994-05-13'</span> <span class="k">AND</span> <span class="n">l_shipdate</span> <span class="o"><</span> <span class="s1">'1995-06-18'</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM_4</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">l_shipdate</span> <span class="o">>=</span> <span class="s1">'1995-06-18'</span> <span class="k">AND</span> <span class="n">l_shipdate</span> <span class="o"><</span> <span class="s1">'1996-07-23'</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM_5</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">l_shipdate</span> <span class="o">>=</span> <span class="s1">'1996-07-23'</span> <span class="k">AND</span> <span class="n">l_shipdate</span> <span class="o"><</span> <span class="s1">'1997-08-28'</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">LINEITEM_6</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">l_shipdate</span> <span class="o">>=</span> <span class="s1">'1997-08-28'</span><span class="p">;</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="$ mysql
CREATE TABLE LINEITEM_1 AS SELECT * FROM LINEITEM WHERE l_shipdate < '1993-04-07';
CREATE TABLE LINEITEM_2 AS SELECT * FROM LINEITEM WHERE l_shipdate >= '1993-04-07' AND l_shipdate < '1994-05-13';
CREATE TABLE LINEITEM_3 AS SELECT * FROM LINEITEM WHERE l_shipdate >= '1994-05-13' AND l_shipdate < '1995-06-18';
CREATE TABLE LINEITEM_4 AS SELECT * FROM LINEITEM WHERE l_shipdate >= '1995-06-18' AND l_shipdate < '1996-07-23';
CREATE TABLE LINEITEM_5 AS SELECT * FROM LINEITEM WHERE l_shipdate >= '1996-07-23' AND l_shipdate < '1997-08-28';
CREATE TABLE LINEITEM_6 AS SELECT * FROM LINEITEM WHERE l_shipdate >= '1997-08-28';
">Copy</button>
</div>
<p>We have partitioned our data in MySQL. For this demonstration, we are using a single MySQL server, but we can place the partitions on other MySQL servers as well. Now, let’s head to Citus and create our distributed table.</p>
<p>First, let’s tell postgres which foreign server will we use with mysql_fdw and also tell it how the current role will use this foreign server:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="err">$</span> <span class="n">psql</span>
<span class="k">CREATE</span> <span class="n">SERVER</span> <span class="n">mysql_svr</span>
<span class="k">FOREIGN</span> <span class="k">DATA</span> <span class="n">WRAPPER</span> <span class="n">mysql_fdw</span>
<span class="k">OPTIONS</span> <span class="p">(</span><span class="k">host</span> <span class="s1">'127.0.0.1'</span><span class="p">,</span> <span class="n">port</span> <span class="s1">'3306'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span>
<span class="n">SERVER</span> <span class="n">mysql_svr</span>
<span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="$ psql
CREATE SERVER mysql_svr
FOREIGN DATA WRAPPER mysql_fdw
OPTIONS (host '127.0.0.1', port '3306');
CREATE USER MAPPING FOR eren
SERVER mysql_svr
OPTIONS (username 'root', password 'toor');
">Copy</button>
</div>
<p>Then create our distributed foreign table for LINEITEM:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">CREATE</span> <span class="k">FOREIGN</span> <span class="k">TABLE</span> <span class="n">LINEITEM</span> <span class="p">(</span> <span class="n">L_ORDERKEY</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_PARTKEY</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SUPPKEY</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_LINENUMBER</span> <span class="nb">INTEGER</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_QUANTITY</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_EXTENDEDPRICE</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_DISCOUNT</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_TAX</span> <span class="nb">DOUBLE</span> <span class="nb">PRECISION</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_RETURNFLAG</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_LINESTATUS</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SHIPDATE</span> <span class="nb">DATE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_COMMITDATE</span> <span class="nb">DATE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_RECEIPTDATE</span> <span class="nb">DATE</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SHIPINSTRUCT</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">25</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_SHIPMODE</span> <span class="nb">CHAR</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">,</span>
<span class="n">L_COMMENT</span> <span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">44</span><span class="p">)</span> <span class="k">NOT</span> <span class="k">NULL</span><span class="p">)</span>
<span class="n">SERVER</span> <span class="n">mysql_svr</span>
<span class="k">OPTIONS</span> <span class="p">(</span><span class="n">dbname</span> <span class="s1">'test'</span><span class="p">,</span> <span class="k">table_name</span> <span class="s1">'LINEITEM'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_distributed_table</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'l_shipdate'</span><span class="p">,</span> <span class="s1">'range'</span><span class="p">);</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="CREATE FOREIGN TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DOUBLE PRECISION NOT NULL,
L_EXTENDEDPRICE DOUBLE PRECISION NOT NULL,
L_DISCOUNT DOUBLE PRECISION NOT NULL,
L_TAX DOUBLE PRECISION NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL)
SERVER mysql_svr
OPTIONS (dbname 'test', table_name 'LINEITEM');
SELECT master_create_distributed_table('LINEITEM', 'l_shipdate', 'range');
">Copy</button>
</div>
<p>Now, let’s create 6 shards that will correspond to the 6 MySQL tables:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SET</span> <span class="n">citus</span><span class="p">.</span><span class="n">shard_replication_factor</span> <span class="k">TO</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">FUNCTION</span> <span class="n">master_create_range_shard</span><span class="p">(</span>
<span class="k">table_name</span> <span class="nb">text</span><span class="p">,</span>
<span class="k">minvalue</span> <span class="nb">text</span><span class="p">,</span>
<span class="k">maxvalue</span> <span class="nb">text</span><span class="p">)</span>
<span class="k">RETURNS</span> <span class="n">void</span>
<span class="k">LANGUAGE</span> <span class="n">plpgsql</span>
<span class="k">AS</span> <span class="err">$</span><span class="k">function</span><span class="err">$</span>
<span class="k">DECLARE</span>
<span class="n">new_shard_id</span> <span class="nb">bigint</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="k">SELECT</span> <span class="n">master_create_empty_shard</span><span class="p">(</span><span class="k">table_name</span><span class="p">)</span>
<span class="k">INTO</span> <span class="n">new_shard_id</span><span class="p">;</span>
<span class="k">UPDATE</span> <span class="n">pg_dist_shard</span>
<span class="k">SET</span> <span class="n">shardminvalue</span> <span class="o">=</span> <span class="k">minvalue</span><span class="p">,</span> <span class="n">shardmaxvalue</span> <span class="o">=</span> <span class="k">maxvalue</span>
<span class="k">WHERE</span> <span class="n">shardid</span> <span class="o">=</span> <span class="n">new_shard_id</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$</span><span class="k">function</span><span class="err">$</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="n">master_create_range_shard</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'1992-01-02'</span><span class="p">,</span> <span class="s1">'1993-04-06'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_range_shard</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'1993-04-07'</span><span class="p">,</span> <span class="s1">'1994-05-12'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_range_shard</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'1994-05-13'</span><span class="p">,</span> <span class="s1">'1995-06-17'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_range_shard</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'1995-06-18'</span><span class="p">,</span> <span class="s1">'1996-07-22'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_range_shard</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'1996-07-23'</span><span class="p">,</span> <span class="s1">'1997-08-27'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="n">master_create_range_shard</span><span class="p">(</span><span class="s1">'LINEITEM'</span><span class="p">,</span> <span class="s1">'1997-08-28'</span><span class="p">,</span> <span class="s1">'1998-12-01'</span><span class="p">);</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SET citus.shard_replication_factor TO 1;
CREATE OR REPLACE FUNCTION master_create_range_shard(
table_name text,
minvalue text,
maxvalue text)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
new_shard_id bigint;
BEGIN
SELECT master_create_empty_shard(table_name)
INTO new_shard_id;
UPDATE pg_dist_shard
SET shardminvalue = minvalue, shardmaxvalue = maxvalue
WHERE shardid = new_shard_id;
END;
$function$;
SELECT master_create_range_shard('LINEITEM', '1992-01-02', '1993-04-06');
SELECT master_create_range_shard('LINEITEM', '1993-04-07', '1994-05-12');
SELECT master_create_range_shard('LINEITEM', '1994-05-13', '1995-06-17');
SELECT master_create_range_shard('LINEITEM', '1995-06-18', '1996-07-22');
SELECT master_create_range_shard('LINEITEM', '1996-07-23', '1997-08-27');
SELECT master_create_range_shard('LINEITEM', '1997-08-28', '1998-12-01');
">Copy</button>
</div>
<p>Let’s see our shards:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">pg_dist_shard</span> <span class="k">WHERE</span> <span class="n">logicalrelid</span><span class="o">=</span><span class="s1">'LINEITEM'</span><span class="p">::</span><span class="n">regclass</span><span class="p">;</span>
<span class="n">logicalrelid</span> <span class="o">|</span> <span class="n">shardid</span> <span class="o">|</span> <span class="n">shardstorage</span> <span class="o">|</span> <span class="n">shardalias</span> <span class="o">|</span> <span class="n">shardminvalue</span> <span class="o">|</span> <span class="n">shardmaxvalue</span>
<span class="c1">--------------+---------+--------------+------------+---------------+---------------</span>
<span class="mi">16468</span> <span class="o">|</span> <span class="mi">102008</span> <span class="o">|</span> <span class="n">f</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">1992</span><span class="o">-</span><span class="mi">01</span><span class="o">-</span><span class="mi">02</span> <span class="o">|</span> <span class="mi">1993</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">06</span>
<span class="mi">16468</span> <span class="o">|</span> <span class="mi">102009</span> <span class="o">|</span> <span class="n">f</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">1993</span><span class="o">-</span><span class="mi">04</span><span class="o">-</span><span class="mi">07</span> <span class="o">|</span> <span class="mi">1994</span><span class="o">-</span><span class="mi">05</span><span class="o">-</span><span class="mi">12</span>
<span class="mi">16468</span> <span class="o">|</span> <span class="mi">102010</span> <span class="o">|</span> <span class="n">f</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">1994</span><span class="o">-</span><span class="mi">05</span><span class="o">-</span><span class="mi">13</span> <span class="o">|</span> <span class="mi">1995</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">17</span>
<span class="mi">16468</span> <span class="o">|</span> <span class="mi">102011</span> <span class="o">|</span> <span class="n">f</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">1995</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">18</span> <span class="o">|</span> <span class="mi">1996</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">22</span>
<span class="mi">16468</span> <span class="o">|</span> <span class="mi">102012</span> <span class="o">|</span> <span class="n">f</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">1996</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">23</span> <span class="o">|</span> <span class="mi">1997</span><span class="o">-</span><span class="mi">08</span><span class="o">-</span><span class="mi">27</span>
<span class="mi">16468</span> <span class="o">|</span> <span class="mi">102013</span> <span class="o">|</span> <span class="n">f</span> <span class="o">|</span> <span class="o">|</span> <span class="mi">1997</span><span class="o">-</span><span class="mi">08</span><span class="o">-</span><span class="mi">28</span> <span class="o">|</span> <span class="mi">1998</span><span class="o">-</span><span class="mi">12</span><span class="o">-</span><span class="mi">01</span>
<span class="p">(</span><span class="mi">6</span> <span class="k">rows</span><span class="p">)</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="SELECT * FROM pg_dist_shard WHERE logicalrelid='LINEITEM'::regclass;
logicalrelid | shardid | shardstorage | shardalias | shardminvalue | shardmaxvalue
--------------+---------+--------------+------------+---------------+---------------
16468 | 102008 | f | | 1992-01-02 | 1993-04-06
16468 | 102009 | f | | 1993-04-07 | 1994-05-12
16468 | 102010 | f | | 1994-05-13 | 1995-06-17
16468 | 102011 | f | | 1995-06-18 | 1996-07-22
16468 | 102012 | f | | 1996-07-23 | 1997-08-27
16468 | 102013 | f | | 1997-08-28 | 1998-12-01
(6 rows)
">Copy</button>
</div>
<p>We have finished configuring the master. We need to connect to workers set which shard is associated with which mysql partition table. Let’s run the commands at workers:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem_102008</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="k">SET</span> <span class="k">table_name</span> <span class="s1">'LINEITEM_1'</span><span class="p">);</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem_102009</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="k">SET</span> <span class="k">table_name</span> <span class="s1">'LINEITEM_2'</span><span class="p">);</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem_102010</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="k">SET</span> <span class="k">table_name</span> <span class="s1">'LINEITEM_3'</span><span class="p">);</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem_102011</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="k">SET</span> <span class="k">table_name</span> <span class="s1">'LINEITEM_4'</span><span class="p">);</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem_102012</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="k">SET</span> <span class="k">table_name</span> <span class="s1">'LINEITEM_5'</span><span class="p">);</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem_102013</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="k">SET</span> <span class="k">table_name</span> <span class="s1">'LINEITEM_6'</span><span class="p">);</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="ALTER TABLE lineitem_102008 OPTIONS (SET table_name 'LINEITEM_1');
ALTER TABLE lineitem_102009 OPTIONS (SET table_name 'LINEITEM_2');
ALTER TABLE lineitem_102010 OPTIONS (SET table_name 'LINEITEM_3');
ALTER TABLE lineitem_102011 OPTIONS (SET table_name 'LINEITEM_4');
ALTER TABLE lineitem_102012 OPTIONS (SET table_name 'LINEITEM_5');
ALTER TABLE lineitem_102013 OPTIONS (SET table_name 'LINEITEM_6');
">Copy</button>
</div>
<p>We also need to define the user mappings for shard foreign servers. Let’s run the user mapping creation commands at workers:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span> <span class="n">SERVER</span> <span class="n">mysql_svr_102008</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span> <span class="n">SERVER</span> <span class="n">mysql_svr_102009</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span> <span class="n">SERVER</span> <span class="n">mysql_svr_102010</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span> <span class="n">SERVER</span> <span class="n">mysql_svr_102011</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span> <span class="n">SERVER</span> <span class="n">mysql_svr_102012</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">USER</span> <span class="n">MAPPING</span> <span class="k">FOR</span> <span class="n">eren</span> <span class="n">SERVER</span> <span class="n">mysql_svr_102013</span> <span class="k">OPTIONS</span> <span class="p">(</span><span class="n">username</span> <span class="s1">'root'</span><span class="p">,</span> <span class="n">password</span> <span class="s1">'toor'</span><span class="p">);</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="CREATE USER MAPPING FOR eren SERVER mysql_svr_102008 OPTIONS (username 'root', password 'toor');
CREATE USER MAPPING FOR eren SERVER mysql_svr_102009 OPTIONS (username 'root', password 'toor');
CREATE USER MAPPING FOR eren SERVER mysql_svr_102010 OPTIONS (username 'root', password 'toor');
CREATE USER MAPPING FOR eren SERVER mysql_svr_102011 OPTIONS (username 'root', password 'toor');
CREATE USER MAPPING FOR eren SERVER mysql_svr_102012 OPTIONS (username 'root', password 'toor');
CREATE USER MAPPING FOR eren SERVER mysql_svr_102013 OPTIONS (username 'root', password 'toor');
">Copy</button>
</div>
<p>Now we’re ready to see things in action. Let’s reconnect to the master and run our distributed query:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="n">mysql</span><span class="o">></span> <span class="k">SELECT</span> <span class="n">L_SHIPMODE</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">L_PARTKEY</span> <span class="o">></span> <span class="mi">100</span> <span class="k">AND</span> <span class="n">L_PARTKEY</span> <span class="o"><</span> <span class="mi">150</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">L_SHIPMODE</span><span class="p">;</span>
<span class="o">+</span><span class="c1">------------+----------+</span>
<span class="o">|</span> <span class="n">L_SHIPMODE</span> <span class="o">|</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">------------+----------+</span>
<span class="o">|</span> <span class="n">AIR</span> <span class="o">|</span> <span class="mi">197</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">FOB</span> <span class="o">|</span> <span class="mi">203</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">MAIL</span> <span class="o">|</span> <span class="mi">213</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">RAIL</span> <span class="o">|</span> <span class="mi">228</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">REG</span> <span class="n">AIR</span> <span class="o">|</span> <span class="mi">222</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">SHIP</span> <span class="o">|</span> <span class="mi">197</span> <span class="o">|</span>
<span class="o">|</span> <span class="n">TRUCK</span> <span class="o">|</span> <span class="mi">215</span> <span class="o">|</span>
<span class="o">+</span><span class="c1">------------+----------+</span>
<span class="mi">7</span> <span class="k">rows</span> <span class="k">in</span> <span class="k">set</span> <span class="p">(</span><span class="mi">1</span> <span class="k">min</span> <span class="mi">27</span><span class="p">.</span><span class="mi">44</span> <span class="n">sec</span><span class="p">)</span>
<span class="n">postgres</span><span class="o">=#</span> <span class="k">SELECT</span> <span class="n">L_SHIPMODE</span><span class="p">,</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span class="n">LINEITEM</span> <span class="k">WHERE</span> <span class="n">L_PARTKEY</span> <span class="o">></span> <span class="mi">100</span> <span class="k">AND</span> <span class="n">L_PARTKEY</span> <span class="o"><</span> <span class="mi">150</span> <span class="k">GROUP</span> <span class="k">BY</span> <span class="n">L_SHIPMODE</span><span class="p">;</span>
<span class="n">l_shipmode</span> <span class="o">|</span> <span class="k">count</span>
<span class="c1">------------+-------</span>
<span class="n">REG</span> <span class="n">AIR</span> <span class="o">|</span> <span class="mi">222</span>
<span class="n">MAIL</span> <span class="o">|</span> <span class="mi">213</span>
<span class="n">TRUCK</span> <span class="o">|</span> <span class="mi">215</span>
<span class="n">SHIP</span> <span class="o">|</span> <span class="mi">197</span>
<span class="n">AIR</span> <span class="o">|</span> <span class="mi">197</span>
<span class="n">FOB</span> <span class="o">|</span> <span class="mi">203</span>
<span class="n">RAIL</span> <span class="o">|</span> <span class="mi">228</span>
<span class="p">(</span><span class="mi">7</span> <span class="k">rows</span><span class="p">)</span>
<span class="nb">Time</span><span class="p">:</span> <span class="mi">28090</span><span class="p">.</span><span class="mi">748</span> <span class="n">ms</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="mysql> SELECT L_SHIPMODE, COUNT(*) FROM LINEITEM WHERE L_PARTKEY > 100 AND L_PARTKEY < 150 GROUP BY L_SHIPMODE;
+------------+----------+
| L_SHIPMODE | COUNT(*) |
+------------+----------+
| AIR | 197 |
| FOB | 203 |
| MAIL | 213 |
| RAIL | 228 |
| REG AIR | 222 |
| SHIP | 197 |
| TRUCK | 215 |
+------------+----------+
7 rows in set (1 min 27.44 sec)
postgres=# SELECT L_SHIPMODE, COUNT(*) FROM LINEITEM WHERE L_PARTKEY > 100 AND L_PARTKEY < 150 GROUP BY L_SHIPMODE;
l_shipmode | count
------------+-------
REG AIR | 222
MAIL | 213
TRUCK | 215
SHIP | 197
AIR | 197
FOB | 203
RAIL | 228
(7 rows)
Time: 28090.748 ms
">Copy</button>
</div>
<p>At this point, we have connected our shards to actual MySQL tables and use Citus’ parallel query capabilities. We can now migrate the data to PostgreSQL tables for even better performance. We can do it by creating actual tables for each shard and replacing the real table with the foreign table shard. For example for shard 102008, in the worker:</p>
<div class="highlight">
<pre class="highlight sql"><code><span class="k">BEGIN</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">intermediate_102008</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">lineitem_102008</span><span class="p">;</span>
<span class="k">DROP</span> <span class="k">TABLE</span> <span class="n">lineitem_102008</span><span class="p">;</span>
<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">intermediate_102008</span> <span class="k">RENAME</span> <span class="k">TO</span> <span class="n">lineitem_102008</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
</code></pre>
<button class="copy-button" data-clipboard-action="copy" data-clipboard-text="BEGIN;
CREATE TABLE intermediate_102008 AS SELECT * FROM lineitem_102008;
DROP TABLE lineitem_102008;
ALTER TABLE intermediate_102008 RENAME TO lineitem_102008;
END;
">Copy</button>
</div>
<p>As we can see, it is possible to bring Citus’ parallel query planning and horizontal scaling capabilities with mysql_fdw to MySQL data. This approach can potentially be extended to any data source with the help of Postgres’ great FDWs. Alternatively, we could migrate the data into regular PostgreSQL tables, which could provide even better performance.</p>
<p><em>This article was originally published on <a href='https://www.citusdata.com/blog/2016/06/10/scaling-mysql-with-citus/'>citusdata.com</a>.</em></p>