python - Puzzling bug on very tiny OpenCL kernel when trying to read an image2d (using pyopencl) -


while developing opencl kernel supposed compute features on image, came across bug didn’t manage solve. figure out problem built silly, tiny kernel still returns wrong values. here is:

__constant sampler_t sampler =  clk_normalized_coords_false |                                 clk_address_clamp_to_edge |                                 clk_filter_nearest;  __kernel void readimagetest(__read_only image2d_t img, __global float *result){     const int2 coord = (int2)(get_local_id(0), get_local_id(1));     int2 nbofworkers = (int2)(get_local_size(0), get_local_size(1));     uint4 tmp = read_imageui(img, sampler, coord);     result[coord.x + coord.y * nbofworkers.x] = (float)tmp.x; } 

as can see, kernel made work 1 workgroup each thread copies red channel of image global buffer.
call kernel 1 workgroup of size (2, 2) on image of 6 6 pixels. red channels contain value different 0. these values go 0 35 left upper corner pixel having red value set 0, right neighbor 1 , on, until right lower corner pixel red value 35. here fragments of python code:

def test_read_img(self):     arr = np.array(range(0, 36), dtype=np.uint8).reshape((6, 6))     img = np.dstack((arr, np.zeros((arr.shape[0], arr.shape[1], 3), dtype=np.uint8)))     result = self.detector.read_img(img, (2, 2)) 

detector instance of class handles ocl calls, here *read_img* function:

def read_img(self, image, local_size):         cl_image = cl.image(self.ctx,                             self.mf.read_only | self.mf.copy_host_ptr,                             self.cl_img_format,                             image.shape[1::-1],                             none,                             image)         out_buf = cl.buffer(self.ctx, self.mf.write_only,                             size=int(local_size[0] * local_size[1] * dtype('float32').itemsize))         self.prog.readimagetest(self.queue, local_size, local_size, cl_image, out_buf)         result = zeros(local_size[0] * local_size[1], float32)         cl.enqueue_copy(self.queue, result, out_buf).wait()         return result 

and how variable *cl_image_format* instanciated:

self.cl_img_format = cl.imageformat(cl.channel_order.rgba,                                     cl.channel_type.unsigned_int8) 

so if worked fine, the result should [0. 1. 6. 7.] instead [0. 24. 4. 28.].
i tested code on 3 different devices: 2 atis , 1 nvidia. returned same false result. made small c program same stuff python , called same kernel time returned me proper result. mistake in python code can’t see though must under nose. have idea wrong?

p.s. i'm using win7 x64, free epd 7.3-2 distribution, python 2.7.3 , used pyopencl installer website.

ok found wrong....stupid me. so, if absent-minded me, here solution:
thomas suggested tested “reading part” , returned expected result.
i read image cl.enqueue_read_image. result wrong explained values first time. had sth like:

[[[ 0  6 12 18]   [24 30  1  7]   [13 19 25 31]   [ 2  8 14 20]   [26 32  3  9]   [15 21 27 33]]   [[ 4 10 16 22]   [28 34  5 11]   [17 23 29 35]   [ 0  0  0  0]   [ 0  0  0  0]   [ 0  0  0  0]] …]]] 

the rest 0.
reason the dstack function returns me f-ordered array. d’oh!!
fixed problem copying result of dstack function:

img = np.dstack((arr, np.zeros((arr.shape[0], arr.shape[1], 3), dtype=np.uint8))).copy() 

the default order value copy function 'c'


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

asp.net - Razor Page Hosted on IIS 6 Fails Every Morning -

c++ - wxwidget compiling on windows command prompt -