开发者

Causes for CL_INVALID_WORK_GROUP_SIZE

when I change the work group size from 16 to 32 or something bigge开发者_开发百科r I get an CL_INVALID_WORK_GROUP_SIZE error. matrix_size is 64.

  localWorkSize[0] = groupsize;
  localWorkSize[1] = localWorkSize[0];
  globalWorkSize[0] = matrix_size;
  globalWorkSize[1] = globalWorkSize[0];

First I checked the documentation for clEnqueueNDRangeKernel which states four (five) different causes CL_INVALID_WORK_GROUP_SIZE, but I think non of them apply. Please check my conclusions. (I hope you don't mind my QA style)


Q CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and number of work-items specified by global_work_size is not evenly divisable by size of work-group given by local_work_size

A 64 % 32 = 0

Q or does not match the work-group size specified for kernel using the __attribute__((reqd_work_group_size(X, Y, Z))) qualifier in program source.

A As I understood the help, I did not use __attribute__.

Q CL_INVALID_WORK_GROUP_SIZE if local_work_size is specified and the total number of work-items in the work-group computed as local_work_size[0] *... local_work_size[work_dim - 1] is greater than the value specified by CL_DEVICE_MAX_WORK_GROUP_SIZE in the table of OpenCL Device Queries for clGetDeviceInfo.

A I queried clGetDeviceInfo and CL_DEVICE_MAX_WORK_GROUP_SIZE is 512, 512, 64

Q CL_INVALID_WORK_GROUP_SIZE if local_work_size is NULL and the __attribute__((reqd_work_group_size(X, Y, Z))) qualifier is used to declare the work-group size for kernel in the program source.

A local_work_size is not NULL.

Q CL_INVALID_WORK_ITEM_SIZE if the number of work-items specified in any of local_work_size[0], ... local_work_size[work_dim - 1] is greater than the corresponding values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], .... CL_DEVICE_MAX_WORK_ITEM_SIZES[work_dim - 1].

A 32 < 512


I hope, I haven't overlooked something. Please tell me, when you have an idea what could cause the CL_INVALID_WORK_GROUP_SIZE or found a error in my conclusions.

Thanks for taking the time to read all this :)


CL_DEVICE_MAX_WORK_GROUP_SIZE should return a single size_t value (for example 512, but I don't know what it'd be on your system). This is the maximum number of work-items in a work-group, not the maximum in each dimension. So in your case you are trying to make a 2D work-group with 32*32 = 1024 work-items, and presumably CL_DEVICE_MAX_WORK_GROUP_SIZE is less than 1024 on your system.

See the OpenCL 1.1 spec, table 4.3, page 37, the definition of CL_DEVICE_MAX_WORK_GROUP_SIZE:

Maximum number of work-items in a work-group executing a kernel using the data parallel execution model.


I had the same problem when I was trying to run my kernel on CPU. I couldn't set work group size more than 128, while CL_DEVICE_MAX_WORK_GROUP_SIZE was returning 1024.
After a little bit of search to find out where 128 is coming from it turned out CL_KERNEL_WORK_GROUP_SIZE was giving the proper value.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜