c++ - OpenMP double for loop array with stored results -
i've spent time going on other posts still can't simple program go.
#include<iostream> #include<cmath> #include<omp.h> using namespace std; int main() { int threadnum =4;//want manual control int steps=100000,cumulative=0, counter; int a,b,c; float dum1, dum2, dum3; float pos[10000][3] = {0}; float non=0; //rng declared #pragma omp parallel private(dum1,dum2,dum3,counter,a,b,c) reduction (+: non, cumulative) num_threads(threadnum) { for(int dummy=0;dummy<(10000/threadnum);dummy++) { dum1=0,dum2=0,dum3=0; a=0,b=0,c=0; (counter=0;counter<steps;counter++) { dum1 = somefunct1()+rand(); dum2=somefunct2()+rand(); dum3 = somefunct3(dum1, dum2, ...); += somefunct4(dum1,dum2,dum3, ...); b += somefunct5(dum1,dum2,dum3, ...); c += somefunct6(dum1,dum2,dum3, ...); cumulative++; //count number of loops executed } pos[dummy][0] = a;//saves results of second loop array pos[dummy][1] = b; pos[dummy][2] = c; non+= pos[dummy][0];//holds summed values } } }
i've cut down program fit here. lot of times if make changes, , i've tried lot, lot of time inner loop not execute correct number of times , cumulative equal 32,532,849 instead of 1 billion. scaling 2x code above should higher.
i want code break first 10000 iteration loop each thread runs number of iterations in parallel (if dynamic nice) , saves results of each iteration of second loop results array. second loop composed of dependents , cannot broken. order of 'dummy' iterations not matter (can switch pos[345] pos[3456] long 3 indices switches) have modify later matter.
the numerous variables , initializations in inner loop confusing me terribly. there lot of random calls , functions/math functions in inner loop - there overhead here causing problem? i'm using gnu 4.9.2 on windows.
any appreciated.
edit: fixed. moved rng declaration inside first loop. 3.75x scaling going 4 threads , 5.72x scaling on 8 threads (hyperthreads). not perfect take it. still think there issue thread locking , syncing.
...... float non=0; #pragma omp parallel private(dum1,dum2,dum3,counter,a,b,c) reduction (+: non, cumulative) num_threads(threadnum) { //rng declared #pragma omp for(int dummy=0;dummy<(10000/threadnum);dummy++) { ....
Comments
Post a Comment