Optimising Query Plans
The initial query plan, generated from a query, may not be the optimal one. We can complete the following to make it run faster.
How to Optimise
Queries are evaluated from the bottom up therefore:
- Efficiency depends on the size of intermediate results.
We should rewrite the initial query plan so that intermediate results are as small as possible.
We can use relational algebra equivalence laws to make query plans with smaller intermediate results.
Heuristics
-
Push selections as far down the tree as possible.
flowchart LR subgraph inefficient 11["π depart, name"] --> 12[σ worksAt=depart] --> 13["⨉"] --> 14[Stores] & 15[Emplyees] end subgraph better 21["π depart, name"] --> 22[σ worksAt=depart AND city=Liv] --> 23["⨉"] --> 24[σ city=Liv] & 25[Emplyees] 24 --> Stores end inefficient --> better
This gets rid of many irrelevant tuples very early during execution.
-
Push projections as far down the tree as possible, or insert projections where appropriate.
flowchart LR subgraph inefficient 11["π depart, name"] --> 12[σ worksAt=depart AND city=Liv] --> 13["⨉"] --> 14[σ city=Liv] & 15[Emplyees] 14 --> 16[Stores] end subgraph better 21["π depart, name"] --> 22[σ worksAt=depart AND city=Liv] --> 23["⨉"] --> 24[π worksAt] & 25[π depart, name] 24 --> 26[σ city=Liv] 26 --> 27[Stores] 25 --> 28[Emplyees] end inefficient --> better
-
If possible, introduct equijoins for $\times$ followed by $\sigma$.
flowchart LR subgraph inefficient 11["π depart, name"] --> 12[σ worksAt=depart AND city=Liv] --> 13["⨉"] --> 14[π worksAt] & 15[π depart, name] 14 --> 16[σ city=Liv] 16 --> 17[Stores] 15 --> 18[Emplyees] end subgraph better 21["π depart, name"] --> 23["⋈ worksAt=depart"] --> 24[π worksAt] & 25[π depart, name] 24 --> 26[σ city=Liv] 26 --> 27[Stores] 25 --> 28[Emplyees] end inefficient --> better
$\bowtie$ is faster than a $\times$ followed by a $\sigma$.
There are also many more heuristics that can be used to optimise the query plan.