multithreading - OpenMP/C++: Parallel for loop with reduction afterwards - best practice? -
given following code...
for (size_t = 0; < clusters.size(); ++i) { const std::set<int>& cluster = clusters[i]; // ... expensive calculations ... (int j : cluster) velocity[j] += f(j); }
...which run on multiple cpus/cores. function f
not use velocity
.
a simple #pragma omp parallel for
before first loop produce unpredictable/wrong results, because std::vector<t> velocity
modified in inner loop. multiple threads may access , (try to) modify same element of velocity
@ same time.
i think first solution write #pragma omp atomic
before velocity[j] += f(j);
operation. gives me compile error (might have elements being of type eigen::vector3d
or velocity
being class member). also, read atomic operations slow compared having private variable each thread , doing reduction in end. that's do, think.
i have come this:
#pragma omp parallel { // these variables local each thread std::vector<eigen::vector3d> velocity_local(velocity.size()); std::fill(velocity_local.begin(), velocity_local.end(), eigen::vector3d(0,0,0)); #pragma omp (size_t = 0; < clusters.size(); ++i) { const std::set<int>& cluster = clusters[i]; // ... expensive calculations ... (int j : cluster) velocity_local[j] += f(j); // save results previous calculations } // each thread can save results global variable #pragma omp critical { (size_t = 0; < velocity_local.size(); ++i) velocity[i] += velocity_local[i]; } }
is solution? best solution? (is correct?)
further thoughts: using reduce
clause (instead of critical
section) throws compiler error. think because velocity
class member.
i have tried find question similar problem, , this question looks it's same. think case might differ because last step includes for
loop. question whether best approach still holds.
edit: request per comment: reduction
clause...
#pragma omp parallel reduction(+:velocity) (omp_int = 0; < velocity_local.size(); ++i) velocity[i] += velocity_local[i];
...throws following error:
error c3028: 'shapematching::velocity' : variable or static data member can used in data-sharing clause
(similar error g++
)
you're doing array reduction. have described several times (e.g. reducing array in openmp , fill histograms array reduction in parallel openmp without using critical section). can , without critical section.
you have done correctly critical section (in recent edit) let me describe how without critical section.
std::vector<eigen::vector3d> velocitya; #pragma omp parallel { const int nthreads = omp_get_num_threads(); const int ithread = omp_get_thread_num(); const int vsize = velocity.size(); #pragma omp single velocitya.resize(vsize*nthreads); std::fill(velocitya.begin()+vsize*ithread, velocitya.begin()+vsize*(ithread+1), eigen::vector3d(0,0,0)); #pragma omp schedule(static) (size_t = 0; < clusters.size(); i++) { const std::set<int>& cluster = clusters[i]; // ... expensive calculations ... (int j : cluster) velocitya[ithread*vsize+j] += f(j); } #pragma omp schedule(static) for(int i=0; i<vsize; i++) { for(int t=0; t<nthreads; t++) { velocity[i] += velocitya[vsize*t + i]; } } }
this method requires care/tuning due false sharing have not done.
as method better have test.
Comments
Post a Comment