OpenMP `parallel` `for`
#pragma omp parallel for
This is a shortcut to create a parallel region and work-share a for loop in one go.
#pragma omp for
doesn’t automatically divide the work between threads.
Data Clauses
Variable Clauses
Variables that are only declared and used inside a parallel region don’t need to exist outside (due to C scoping rules).
Variables declared outside the scope of the parallel region, accessed from inside, can have different sharing settings:
shared
- Leave the variable as it was originally defined. All threads will refer to the original memory location when trying to access it (useful for read-only).
private
- Allocate a new memory location for each thread corresponding to this variable. Any accesses to this variable inside the parallel region will now refer to this memory location instead of the original variable. The final value of this variable is not of interest.
firstprivate
- Same as private, with the addition that the private copies of the variable will be initialised with the value that it was at the start of the parallel region.
lastprivate
- Same as private, with the addition that the value of the variable will persist after the parallel region is over. Works with omp
for
, but not with general ompparallel
sections.
We define data clauses for variables using the following syntax:
#praga omp parallel for shared(x)
for (int i = 0; i <= 10; i++) {
...
}
or
#praga omp parallel for firstprivate(x)
for (int i = 0; i <= 10; i++) {
...
}
We can also set a default using:
#praga omp parallel for default(...)
In the brackets we can set a default but the recommended value is default(none)
.
Reduction
We can define a reduction to be stored in a global variable with:
reduction(op:var)
To add values into a variable sum
:
reduction(+:sum)
Each thread will have it’s own copy of sum
which will later be summed in sum
.
Work-Sharing
We can define the work sharing schedule by using:
schedule(static/dynamic/guided/runtime, <chunk size>)
-
static
- Schedule is statically determined. This has less overhead if we can allocate optimally. -
dynamic
- Work can be stolen from other threads if another thread is idle. This is good for lazy programmers and while loops.
We can use this, for example, to multiply to matrices:
#pragma omp parallel for schedule(static , 1)
for (row = 1; row <= 6; row++) {
for (col = 1; col <= row; col++) {
res[row, col] += l[row, col] * y[col];
}
}
We also have available:
guided
- Chunks of decreasing size handed out as threads get free.-
runtime
- A promise to the compiler that the schedules options will be provided at runtime via an environment variable:OMP_SCHEDULE="guided,10"
-
auto
- Finds the best over the course of several runs.This is not preferred in HPC as it is non-deterministic.
Serial & Parallel Code
We can use the following ifdef
to compile conditionally:
#ifdef _OPENMP
// parallel code
#endif