Skip to content
UoL CS Notes

Computing Joins Using Sorting

COMP207 Lectures

Equijoins

An equijoin:

\[R\bowtie_{A=B}\]

is defined as:

\[\sigma_{A=B}(R\times S)\]

$A,B$ are the join attributes.

You can see an example on the following tables:

  • Stores

    code city
    12345 1
    67910 2
  • Employees

    name depart
    Oscar 12345
    Janice 67910
    David 67910
  • $\text{Stores}\bowtie_\text{code=depart}\text{Employees}$

    code city name depart
    1235 1 Oscar 12345
    67910 2 Janice 679810
    67910 2 David 678910

Merging (Merge Sort)

If $R$ is sorted on $A$ and $S$ is sorted on $B$, then $R\bowtie_{A=B}S$ can be computed with one pass over $R$ and $S$ + run time equal to the size of the output.

With sorted tables we can just step through the columns in the constraint ($A$ and $B$) while one is smaller than the other and just print out when the constraint is matched.

This gives a runtime of:

\[O\lvert R\rvert +\lvert S\rvert +\text{size of output}\]

Merging with Duplicates in Column $A$

If there are duplicates in the left table then you can just loop through the right table again for the duplicate values.

Algorithm

You can use the following algorithm to compute this:

Sort $R$ on $A$
Sort $S$ on $B$
Merge the sorted $R$ and $S$

  1. Running time $O(\lvert R\rvert \times \log_2\lvert R\rvert)$
  2. Running time $O(\lvert S\rvert \times \log_2\lvert S\rvert)$
  3. Running time O\lvert R\rvert +\lvert S\rvert +\text{size of output}

Therefore the typical running time is:

\[O(\lvert R\rvert\log_2\lvert R\rvert +\lvert S\rvert\log_2\lvert S\rvert)\]

This is much faster than nested loop join but is the same in the worse case scenario.

Running Time vs. Disk Access

  • $B$ - Size of a disk block (512/4k bytes)
  • $M$ - Number of disk blocks that fit into available RAM
Algorithm No. of Elementary Operations No. of Disk Accesses
Reading a relation $R$. $O(\lvert R\rvert)$ $O(\frac{\lvert R \rvert}B)$
Sorting $R$ on attrubte $A$ $O(\lvert R\rvert\log_2\lvert R \rvert)$ $O(\frac{\lvert R\rvert}B\log_M\frac{\lvert R \rvert}B)$