OpenMP Barriers

COMP328 Lectures 2023-03-21

There are implicit barriers at the end of work-sharing constructs and parallel sections.

`nowait`

We can use nowait to remove the implicit barrier on a directive:

#pragma omp for nowait

Additionally schedule(static), can guarantee the order of operations provided that the total number iteration in a for loop are the same.

This is because each thread is tied to a particular iteration. Therefore, the operations are sequential for that thread.

`barrier`

Manually places a barrier at some point in a parallel section:

#pragma omp barrier

`master`

Only executes on the master thread. This has no barriers:

#pragma omp master

NUMA Consideration

The first thread to touch a variable defines in who’s memory the variable resides.

flowchart LR
MEM0 <--> CPU0
CPU0 <--> CPU1
CPU1 <--> MEM1

This code would be very slow as the array was defined all on the main thread:

// init on thread 0
for (int i=m; i<n; i++) {
	z[i] = init(i);
}

// work on all threads
#pragma omp parallel for
for (int i=m; i<n; i++) {
	z[i] *= work(i);
}

It would be better to write the data using the same socket that will use it later:

// init on all threads
#pragma omp parallel for schedule(static)
for (int i=m; i<n; i++) {
	z[i] = init(i);
}

// work using the threads that defined each int	
#pragma omp parallel for schedule(static)
for (int i=m; i<n; i++) {
	z[i] *= work(i);
}

Thread Affinity

We can ensure our threads don’t change cores over the course of the program’s runtime by using the following environment variables:

OMP_SET_DYNAMIC=false:
- This guarantees that the number of threads you asked for are used.
OMP_PROC_BIND=true:
- Binds threads to a core.
OMP_PLACES=:
- Allows you to hand-pick which thread lives on which core.
- This significantly reduces portability.