How to speed up simple Fortran OpenMP? -
i have simple fortran program in main component 4-core openmp portion calculates dot product
omp_num_threads=4 ... 30 k=1,lines co(k)=0 si(k)=0 co_temp=0 si_temp=0 !$omp parallel private(dotprod,qcur) reduction(+:co_temp,si_temp) 40 i=1,ion_count dotprod=(rx(k)*x(i)+ry(k)*y(i)+rz(k)*z(i))*((2*3.1415926535)/l) co_temp=co_temp+cos(dotprod)*26 !qcur/qavg si_temp=si_temp+sin(dotprod)*26 !qcur/qavg 40 continue !$omp end parallel co(k)=co_temp si(k)=si_temp q(k)= ( co(k),-si(k) ) s(k)= s(k) +( q(k) * conjg(q(k)) ) r(k)=r(k)+q(k) 30 continue
i'm not experienced fortran or optimization. i'm using xlf90_r file -qsmp=omp compile. 1/2 speedup when using 4 cores, else using c has gotten perfect 1/4 speedup doing same computation. same amount of time whether omp loop on 30 or 40. time around loop 30 program whole , loop takes 99.x% of time, i'm pretty sure bit bottleneck. egregious slow mistakes i've made in portion sees?
at quick glance @ code, looks each iteration of outer loop independent. make parallel loop not inner loop.
omp_num_threads=4 ... !$omp parallel private(dotprod,qcur,co_temp,si_temp) 30 k=1,lines co(k)=0 si(k)=0 co_temp=0 si_temp=0 40 i=1,ion_count dotprod=(rx(k)*x(i)+ry(k)*y(i)+rz(k)*z(i))*((2*3.1415926535)/l) co_temp=co_temp+cos(dotprod)*26 !qcur/qavg si_temp=si_temp+sin(dotprod)*26 !qcur/qavg 40 continue co(k)=co_temp si(k)=si_temp q(k)= ( co(k),-si(k) ) s(k)= s(k) +( q(k) * conjg(q(k)) ) r(k)=r(k)+q(k) 30 continue !$omp end parallel
Comments
Post a Comment