Issue With mapping 2D to 1D array in CUDA -


i had written following code in cuda

__global__ void test(int *b_dev) {   int index=blockdim.x*blockidx.x+threadidx.x;   b_dev[index]=1;  }   int main()  {    int **a;    int *b_dev;    a=(int**)malloc(sizeof(int*)*4);    for(i=0;i<4;i++)      a[i]=(int*)malloc(sizeof(int)*4);     //initialise array here 0     cudamalloc((void**)&b_dev,sizeof(int)*16);    cudamemcpy(b_dev,a,sizeof(int)*16,cudamemcpyhosttodevice);    test<<<4,4>>>(dev_b);    cudamemcpy(a,b_dev,sizeof(int)*16,cudamemcpydevicetohost);    for(i=0;i<4;i++)      for(j=0;j<4;j++)         cout<<a[i][j];   } 

i have 2d array in host flatten 1d array , process in gpu code produces segmentation fault when try print array a,in host,but when comment out line b_dev[valindex]=1 in kernal,it prints array a initialised zeroes. visual c++ debugger indicates that

cxx0030:error expression cannot evaluated.

kindly please lead me on

when have array of arrays allocated in manner do, have no guarentee each of arrays contiguous in memory. more specifically, in example have int** array a, consists of 4 int* arrays, a[0], a[1], a[2], , a[3]. within each array a[i] (where array index 2d array) memory contiguous. however, there's no guarentee memory array a[i] , memory array a[i+1] are. is, between calls malloc, memory being allocated can anywhere in free store, , whether or not they're contiguous malloc. (as aside, if allocated memory on stack, contiguous, or on heap 1d array).

thus, can't expect 1 call cudamemcpy copy on of arrays. instead, you'd have perform multiple cudamemcpy calls each 1d array in order copy of them, , pointer arithmatic on destination pointer ensure copied correct location.

when working contiguous 2d data, can use cudamemcpy2d, has signature:

cudaerror_t cudamemcpy2d ( void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, cudamemcpykind kind ) 

you can use if know source , destination pitch lengths, , take pitch account when using data. however, function assumes data dimensions contiguous, wouldn't in case.

of course, easiest solution choose array dimension protocol , stick (e.g. either have memory 2d or 1d, don't mix them unless have compelling reason to).

also i'd remiss if didn't leave link relevant cuda documentation cudamemcpy


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -