UoL CS Notes

Measuring Performance of Code

COMP328 Lectures 2023-02-07

Performance Lists

You can measure super computers by different metrics:

Top500 (By linpack, $R_\max$)
Green500 (Flops/Watt)
Graph500 (Graph computing)
HPL-AI (AI computing)

Using a HPC Cluster

Login to login node (don’t do any hard work here).
Submit work to the compute nodes.

Measuring Runtime

Use one of the functions to get the current time in millis.
Check the time again at the end of execution.

Ensure you have exclusive access over the compute node you are running on.
Run Multiple times (3-5) to ramp up clocks and average out interference from other programs.
Report the minimum time of all the runs.

As the error $\epsilon$ will always be positive, reporting the minimum time will give the closes to the true value:

\[r_0=r_t+\epsilon\]

where:

$r_0$ - Observed runtime
$r_t$ - True runtime

Arithmetic Intensity

Arithmetic intensity is related to the time complexity of computations in an algorithm. It can also be characterised by the following equation:

\[I(n)=\frac{W(n)}{Q(n)}\]

where $W$ is the number of flops carried out by the program and $Q$ is the number of bytes transferred from memory to cache.

For example if our CPU has 416 Gflop/s and 68 GB/s of compute and memory bandwidth, 6 operations per byte would be ideal. In this case we would be making best use of the hardware.

Programs with high AI are called compute-bound programs.
Programs with low AI are called memory-bound programs.

Generally, modern programs are memory bound.

Roofline Model

We can measure algorithm performance by comparing to the maximum performance of the system.

The peak performance bound is the flop/s of a system.