python - Puzzling bug on very tiny OpenCL kernel when trying to read an image2d (using pyopencl) -
while developing opencl kernel supposed compute features on image, came across bug didn’t manage solve. figure out problem built silly, tiny kernel still returns wrong values. here is:
__constant sampler_t sampler = clk_normalized_coords_false | clk_address_clamp_to_edge | clk_filter_nearest; __kernel void readimagetest(__read_only image2d_t img, __global float *result){ const int2 coord = (int2)(get_local_id(0), get_local_id(1)); int2 nbofworkers = (int2)(get_local_size(0), get_local_size(1)); uint4 tmp = read_imageui(img, sampler, coord); result[coord.x + coord.y * nbofworkers.x] = (float)tmp.x; }
as can see, kernel made work 1 workgroup each thread copies red channel of image global buffer.
call kernel 1 workgroup of size (2, 2) on image of 6 6 pixels. red channels contain value different 0. these values go 0 35 left upper corner pixel having red value set 0, right neighbor 1 , on, until right lower corner pixel red value 35. here fragments of python code:
def test_read_img(self): arr = np.array(range(0, 36), dtype=np.uint8).reshape((6, 6)) img = np.dstack((arr, np.zeros((arr.shape[0], arr.shape[1], 3), dtype=np.uint8))) result = self.detector.read_img(img, (2, 2))
detector instance of class handles ocl calls, here *read_img* function:
def read_img(self, image, local_size): cl_image = cl.image(self.ctx, self.mf.read_only | self.mf.copy_host_ptr, self.cl_img_format, image.shape[1::-1], none, image) out_buf = cl.buffer(self.ctx, self.mf.write_only, size=int(local_size[0] * local_size[1] * dtype('float32').itemsize)) self.prog.readimagetest(self.queue, local_size, local_size, cl_image, out_buf) result = zeros(local_size[0] * local_size[1], float32) cl.enqueue_copy(self.queue, result, out_buf).wait() return result
and how variable *cl_image_format* instanciated:
self.cl_img_format = cl.imageformat(cl.channel_order.rgba, cl.channel_type.unsigned_int8)
so if worked fine, the result should [0. 1. 6. 7.] instead [0. 24. 4. 28.].
i tested code on 3 different devices: 2 atis , 1 nvidia. returned same false result. made small c program same stuff python , called same kernel time returned me proper result. mistake in python code can’t see though must under nose. have idea wrong?
p.s. i'm using win7 x64, free epd 7.3-2 distribution, python 2.7.3 , used pyopencl installer website.
ok found wrong....stupid me. so, if absent-minded me, here solution:
thomas suggested tested “reading part” , returned expected result.
i read image cl.enqueue_read_image. result wrong explained values first time. had sth like:
[[[ 0 6 12 18] [24 30 1 7] [13 19 25 31] [ 2 8 14 20] [26 32 3 9] [15 21 27 33]] [[ 4 10 16 22] [28 34 5 11] [17 23 29 35] [ 0 0 0 0] [ 0 0 0 0] [ 0 0 0 0]] …]]]
the rest 0.
reason the dstack function returns me f-ordered array. d’oh!!
fixed problem copying result of dstack function:
img = np.dstack((arr, np.zeros((arr.shape[0], arr.shape[1], 3), dtype=np.uint8))).copy()
the default order value copy function 'c'
Comments
Post a Comment