Query Processing for DDBMS

If we don’t have all the information we need for a query at a node, we need to request information from other sites:

Joins

We can often send less than the whole table over the network when completing joins.

We can deduce that:

\[R\ltimes S=R\bowtie \pi_{\text{common attributes of }R\text{ and }S}(S)\]

$R\ltimes S$ is the set of all tuples in $R$ that NATURAL JOIN at least one tuple in $S$.

Therefore we can save on network communications by only sending the part of $S$ that is needed.

graph LR
a[("A (stores R)")] ---|network| b[("B (stores S)")]

We can then use this to create an optimised method to calculate $R\bowtie S$ at site $B$ (which stores $S$).

Site $B$ sends $S’:=\pi_{\text{common attributes of }R\text{ and }S}(S)$ to site $A$.
Site $A$ sends $R’:=R\ltimes S(=R\bowtie S’)$ to site $B$.
Site $B$ outputs $R’\bowtie S$

There is an example, with attribute names, at the end of the lecture.

The cost of this communication is:

\[\lvert S'\rvert\times(\text{size of tuple in }S')+\lvert R'\rvert\times(\text{size tuple in }R')\]

This method is sometimes more efficient than just exchanging the whole table depending on:

In general, this solution is optimal if:

\[\lvert\pi_\text{common attributes}(S)\rvert+\lvert R\ltimes S\rvert <\lvert R\rvert\]

In the worst case you can send twice the size of $R$.