Physical Query Plans

COMP207 Lectures 2021-11-16

A physical query plan adds information required to execute the optimised query plan such as:

Which algorithm to use for execution of operators.
How to pass information from one operator to the other.
Good order for computing joins, unions, etc.
Additional operations such as sorting.

Converting a Logical Plan to a Physical Plan

We can generate many different physical query plans from an optimised logical query plan. We need to choose the physical plan with the lowest cost estimate in terms of:

Time
Disk Accesses
Memory
Communication
…

Estimating Cost of Execution

We should mainly focus on the number of disk accesses as they are usually the most expensive operation.

It can be influenced by:

Selection of algorithms for the individual operators.
Method for passing information.
Size of intermediate results.

This is one of the most critical factors.

We can estimate these from parameters of the database such as:

Size of relations.
Number of distinct items per attribute per relation.
Computed exactly from the database.

You can’t execute the query, you have to rely on statistics gathered from data.

Estimate Size of Selection

Estimate for the size of $\sigma_{A=a}(R)$:

\[\frac{\lvert R\rvert}{\text{number of distinct values in column }A \text{ of relation }R}\]

Estimate for the number of block required to store $\sigma_{A=a}(R)$:

\[\frac{\text{number of blocks for }R}{\text{number of distinct values in column }A\text{ of relation }R}\]

This is good if values in column $A$ or $R$ occur equally often, but can be bad.

Example

Assume that:

Stores contains 80 tuples, stored in 20 blocks.
There are exactly 4 distinct values for the city attribute of Stores.

To estimate the size of $\sigma_\text{city=Liv}(\text{Stores})$:

\[\frac{80}4=20\text{ tuples}\]

To estimate for the number of blocks that are required to store $\sigma_\text{city=Liv}(\text{Stores})$:

\[\frac{20}4=5\text{ blocks}\]

Estimate Size of Joins

To estimate $R\bowtie S$ we can do an estimate based on the size of $R$ and $S$ and a number of distinct values in common attributes:

\[\frac{\lvert R\rvert\times\lvert S\rvert}{\text{max. number of distinct values for }A\text{ in }R\text{ or }S}\]

This assumes that $A$ is the only common attribute.

Passing Information

There are two methods for passing information between different stages of the query plan:

Materialisation - Write intermediate results to disk.
Pipelining (stream based processing):
- Passes the tuples of one operation directly to the next operation without using disk.
- Extra buffer for each pair of adjacent operation to hold tuples passing from one relation to the other.