Optimising Query Plans

COMP207 Lectures 2021-11-16

The initial query plan, generated from a query, may not be the optimal one. We can complete the following to make it run faster.

How to Optimise

Queries are evaluated from the bottom up therefore:

Efficiency depends on the size of intermediate results.

We should rewrite the initial query plan so that intermediate results are as small as possible.

We can use relational algebra equivalence laws to make query plans with smaller intermediate results.

Heuristics

Push selections as far down the tree as possible.

 flowchart LR
 subgraph inefficient
 11["π depart, name"] --> 12[σ worksAt=depart] --> 13["⨉"] --> 14[Stores] & 15[Emplyees]
 end
 subgraph better
 21["π depart, name"] --> 22[σ worksAt=depart AND city=Liv] --> 23["⨉"] --> 24[σ city=Liv] & 25[Emplyees]
 24 --> Stores
 end
 inefficient --> better

This gets rid of many irrelevant tuples very early during execution.

Push projections as far down the tree as possible, or insert projections where appropriate.

 flowchart LR
 subgraph inefficient
 11["π depart, name"] --> 12[σ worksAt=depart AND city=Liv] --> 13["⨉"] --> 14[σ city=Liv] & 15[Emplyees]
 14 --> 16[Stores]
 end
 subgraph better
 21["π depart, name"] --> 22[σ worksAt=depart AND city=Liv] --> 23["⨉"] --> 24[π worksAt] & 25[π depart, name]
 24 --> 26[σ city=Liv]
 26 --> 27[Stores]
 25 --> 28[Emplyees]
 end
 inefficient --> better

If possible, introduct equijoins for $\times$ followed by $\sigma$.

 flowchart LR
 subgraph inefficient
 11["π depart, name"] --> 12[σ worksAt=depart AND city=Liv] --> 13["⨉"] --> 14[π worksAt] & 15[π depart, name]
 14 --> 16[σ city=Liv]
 16 --> 17[Stores]
 15 --> 18[Emplyees]
 end
 subgraph better
 21["π depart, name"] --> 23["⋈ worksAt=depart"] --> 24[π worksAt] & 25[π depart, name]
 24 --> 26[σ city=Liv]
 26 --> 27[Stores]
 25 --> 28[Emplyees]
 end
 inefficient --> better

$\bowtie$ is faster than a $\times$ followed by a $\sigma$.

There are also many more heuristics that can be used to optimise the query plan.