OpenMP Barriers
There are implicit barriers at the end of work-sharing constructs and parallel sections.
nowait
We can use nowait
to remove the implicit barrier on a directive:
#pragma omp for nowait
Additionally schedule(static)
, can guarantee the order of operations provided that the total number iteration in a for loop are the same.
This is because each thread is tied to a particular iteration. Therefore, the operations are sequential for that thread.
barrier
Manually places a barrier at some point in a parallel section:
#pragma omp barrier
master
Only executes on the master thread. This has no barriers:
#pragma omp master
NUMA Consideration
The first thread to touch a variable defines in who’s memory the variable resides.
flowchart LR
MEM0 <--> CPU0
CPU0 <--> CPU1
CPU1 <--> MEM1
This code would be very slow as the array was defined all on the main thread:
// init on thread 0
for (int i=m; i<n; i++) {
z[i] = init(i);
}
// work on all threads
#pragma omp parallel for
for (int i=m; i<n; i++) {
z[i] *= work(i);
}
It would be better to write the data using the same socket that will use it later:
// init on all threads
#pragma omp parallel for schedule(static)
for (int i=m; i<n; i++) {
z[i] = init(i);
}
// work using the threads that defined each int
#pragma omp parallel for schedule(static)
for (int i=m; i<n; i++) {
z[i] *= work(i);
}
Thread Affinity
We can ensure our threads don’t change cores over the course of the program’s runtime by using the following environment variables:
OMP_SET_DYNAMIC=false
:- This guarantees that the number of threads you asked for are used.
OMP_PROC_BIND=true
:- Binds threads to a core.
OMP_PLACES=
:- Allows you to hand-pick which thread lives on which core.
- This significantly reduces portability.