Measuring Performance of Code
Performance Lists
You can measure super computers by different metrics:
- Top500 (By linpack, $R_\max$)
- Green500 (Flops/Watt)
- Graph500 (Graph computing)
- HPL-AI (AI computing)
Using a HPC Cluster
- Login to login node (don’t do any hard work here).
- Submit work to the compute nodes.
Measuring Runtime
- Use one of the functions to get the current time in millis.
-
Check the time again at the end of execution.
Ensure you have exclusive access over the compute node you are running on.
- Run Multiple times (3-5) to ramp up clocks and average out interference from other programs.
- Report the minimum time of all the runs.
As the error $\epsilon$ will always be positive, reporting the minimum time will give the closes to the true value:
\[r_0=r_t+\epsilon\]where:
- $r_0$ - Observed runtime
- $r_t$ - True runtime
Arithmetic Intensity
Arithmetic intensity is related to the time complexity of computations in an algorithm. It can also be characterised by the following equation:
\[I(n)=\frac{W(n)}{Q(n)}\]where $W$ is the number of flops carried out by the program and $Q$ is the number of bytes transferred from memory to cache.
For example if our CPU has 416 Gflop/s and 68 GB/s of compute and memory bandwidth, 6 operations per byte would be ideal. In this case we would be making best use of the hardware.
- Programs with high AI are called compute-bound programs.
- Programs with low AI are called memory-bound programs.
Generally, modern programs are memory bound.
Roofline Model
We can measure algorithm performance by comparing to the maximum performance of the system.
- The peak performance bound is the flop/s of a system.