python - Puzzling bug on very tiny OpenCL kernel when trying to read an image2d (using pyopencl) -

- February 15, 2014

while developing opencl kernel supposed compute features on image, came across bug didn’t manage solve. figure out problem built silly, tiny kernel still returns wrong values. here is:

__constant sampler_t sampler =  clk_normalized_coords_false |                                 clk_address_clamp_to_edge |                                 clk_filter_nearest;  __kernel void readimagetest(__read_only image2d_t img, __global float *result){     const int2 coord = (int2)(get_local_id(0), get_local_id(1));     int2 nbofworkers = (int2)(get_local_size(0), get_local_size(1));     uint4 tmp = read_imageui(img, sampler, coord);     result[coord.x + coord.y * nbofworkers.x] = (float)tmp.x; }

as can see, kernel made work 1 workgroup each thread copies red channel of image global buffer.
call kernel 1 workgroup of size (2, 2) on image of 6 6 pixels. red channels contain value different 0. these values go 0 35 left upper corner pixel having red value set 0, right neighbor 1 , on, until right lower corner pixel red value 35. here fragments of python code:

def test_read_img(self):     arr = np.array(range(0, 36), dtype=np.uint8).reshape((6, 6))     img = np.dstack((arr, np.zeros((arr.shape[0], arr.shape[1], 3), dtype=np.uint8)))     result = self.detector.read_img(img, (2, 2))

detector instance of class handles ocl calls, here *read_img* function:

def read_img(self, image, local_size):         cl_image = cl.image(self.ctx,                             self.mf.read_only | self.mf.copy_host_ptr,                             self.cl_img_format,                             image.shape[1::-1],                             none,                             image)         out_buf = cl.buffer(self.ctx, self.mf.write_only,                             size=int(local_size[0] * local_size[1] * dtype('float32').itemsize))         self.prog.readimagetest(self.queue, local_size, local_size, cl_image, out_buf)         result = zeros(local_size[0] * local_size[1], float32)         cl.enqueue_copy(self.queue, result, out_buf).wait()         return result

and how variable *cl_image_format* instanciated:

self.cl_img_format = cl.imageformat(cl.channel_order.rgba,                                     cl.channel_type.unsigned_int8)

so if worked fine, the result should [0. 1. 6. 7.] instead [0. 24. 4. 28.].
i tested code on 3 different devices: 2 atis , 1 nvidia. returned same false result. made small c program same stuff python , called same kernel time returned me proper result. mistake in python code can’t see though must under nose. have idea wrong?

p.s. i'm using win7 x64, free epd 7.3-2 distribution, python 2.7.3 , used pyopencl installer website.

ok found wrong....stupid me. so, if absent-minded me, here solution:
thomas suggested tested “reading part” , returned expected result.
i read image cl.enqueue_read_image. result wrong explained values first time. had sth like:

[[[ 0  6 12 18]   [24 30  1  7]   [13 19 25 31]   [ 2  8 14 20]   [26 32  3  9]   [15 21 27 33]]   [[ 4 10 16 22]   [28 34  5 11]   [17 23 29 35]   [ 0  0  0  0]   [ 0  0  0  0]   [ 0  0  0  0]] …]]]

the rest 0.
reason the dstack function returns me f-ordered array. d’oh!!
fixed problem copying result of dstack function:

img = np.dstack((arr, np.zeros((arr.shape[0], arr.shape[1], 3), dtype=np.uint8))).copy()

the default order value copy function 'c'

Search This Blog

Permission

python - Puzzling bug on very tiny OpenCL kernel when trying to read an image2d (using pyopencl) -

Comments

Post a Comment

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -