multithreading - OpenMP/C++: Parallel for loop with reduction afterwards - best practice? -


given following code...

for (size_t = 0; < clusters.size(); ++i) {     const std::set<int>& cluster = clusters[i];     // ... expensive calculations ...     (int j : cluster)         velocity[j] += f(j);  }  

...which run on multiple cpus/cores. function f not use velocity.

a simple #pragma omp parallel for before first loop produce unpredictable/wrong results, because std::vector<t> velocity modified in inner loop. multiple threads may access , (try to) modify same element of velocity @ same time.

i think first solution write #pragma omp atomic before velocity[j] += f(j);operation. gives me compile error (might have elements being of type eigen::vector3d or velocity being class member). also, read atomic operations slow compared having private variable each thread , doing reduction in end. that's do, think.

i have come this:

#pragma omp parallel {     // these variables local each thread     std::vector<eigen::vector3d> velocity_local(velocity.size());     std::fill(velocity_local.begin(), velocity_local.end(), eigen::vector3d(0,0,0));      #pragma omp     (size_t = 0; < clusters.size(); ++i)     {         const std::set<int>& cluster = clusters[i];         // ... expensive calculations ...         (int j : cluster)             velocity_local[j] += f(j); // save results previous calculations     }       // each thread can save results global variable     #pragma omp critical     {         (size_t = 0; < velocity_local.size(); ++i)             velocity[i] += velocity_local[i];     } }  

is solution? best solution? (is correct?)

further thoughts: using reduce clause (instead of critical section) throws compiler error. think because velocity class member.

i have tried find question similar problem, , this question looks it's same. think case might differ because last step includes for loop. question whether best approach still holds.

edit: request per comment: reduction clause...

    #pragma omp parallel reduction(+:velocity)     (omp_int = 0; < velocity_local.size(); ++i)         velocity[i] += velocity_local[i]; 

...throws following error:

error c3028: 'shapematching::velocity' : variable or static data member can used in data-sharing clause

(similar error g++)

you're doing array reduction. have described several times (e.g. reducing array in openmp , fill histograms array reduction in parallel openmp without using critical section). can , without critical section.

you have done correctly critical section (in recent edit) let me describe how without critical section.


std::vector<eigen::vector3d> velocitya; #pragma omp parallel {     const int nthreads = omp_get_num_threads();     const int ithread = omp_get_thread_num();     const int vsize = velocity.size();      #pragma omp single     velocitya.resize(vsize*nthreads);     std::fill(velocitya.begin()+vsize*ithread, velocitya.begin()+vsize*(ithread+1),                eigen::vector3d(0,0,0));      #pragma omp schedule(static)     (size_t = 0; < clusters.size(); i++) {         const std::set<int>& cluster = clusters[i];         // ... expensive calculations ...         (int j : cluster) velocitya[ithread*vsize+j] += f(j);     }       #pragma omp schedule(static)     for(int i=0; i<vsize; i++) {         for(int t=0; t<nthreads; t++) {             velocity[i] += velocitya[vsize*t + i];         }     } } 

this method requires care/tuning due false sharing have not done.

as method better have test.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -