OpenMP `parallel` `for`

COMP328 Lectures 2023-03-17

#pragma omp parallel for

This is a shortcut to create a parallel region and work-share a for loop in one go.

#pragma omp for doesn’t automatically divide the work between threads.

Data Clauses

Variable Clauses

Variables that are only declared and used inside a parallel region don’t need to exist outside (due to C scoping rules).

Variables declared outside the scope of the parallel region, accessed from inside, can have different sharing settings:

shared: Leave the variable as it was originally defined. All threads will refer to the original memory location when trying to access it (useful for read-only).
private: Allocate a new memory location for each thread corresponding to this variable. Any accesses to this variable inside the parallel region will now refer to this memory location instead of the original variable. The final value of this variable is not of interest.
firstprivate: Same as private, with the addition that the private copies of the variable will be initialised with the value that it was at the start of the parallel region.
lastprivate: Same as private, with the addition that the value of the variable will persist after the parallel region is over. Works with omp for, but not with general omp parallel sections.

We define data clauses for variables using the following syntax:

#praga omp parallel for shared(x)
for (int i = 0; i <= 10; i++) {
	...
}

#praga omp parallel for firstprivate(x)
for (int i = 0; i <= 10; i++) {
	...
}

We can also set a default using:

#praga omp parallel for default(...)

In the brackets we can set a default but the recommended value is default(none).

Reduction

We can define a reduction to be stored in a global variable with:

reduction(op:var)

To add values into a variable sum:

reduction(+:sum)

Each thread will have it’s own copy of sum which will later be summed in sum.

We can define the work sharing schedule by using:

schedule(static/dynamic/guided/runtime, <chunk size>)

static - Schedule is statically determined. This has less overhead if we can allocate optimally.
dynamic - Work can be stolen from other threads if another thread is idle. This is good for lazy programmers and while loops.

We can use this, for example, to multiply to matrices:

#pragma omp parallel for schedule(static , 1)
for (row = 1; row <= 6; row++) {
	for (col = 1; col <= row; col++) {
		res[row, col] += l[row, col] * y[col];
	}
}

We also have available:

guided - Chunks of decreasing size handed out as threads get free.
runtime - A promise to the compiler that the schedules options will be provided at runtime via an environment variable:
```
  OMP_SCHEDULE="guided,10"
```
auto - Finds the best over the course of several runs.

This is not preferred in HPC as it is non-deterministic.

Serial & Parallel Code

We can use the following ifdef to compile conditionally:

#ifdef _OPENMP
// parallel code
#endif

Data Clauses

Variable Clauses

Reduction

Work-Sharing

Serial & Parallel Code