开发者

Why can't I copy data in a struct to an openCL cl_mem buffer correctly?

OK, so I have isolated this down to a very specific problem.

I was under the impression you could pass OpenCL any type of data in an array buffer; ints, chars, your own custom structs, as long as it was all just data and didn't contain pointers to heap objects that the GPU won't be able to retrieve.

Now, I've tried this and I think that it works for a big array of ints, but fails for my array of structs. specifically,

cl_mem log_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, 
  num_elements * sizeof(int), NULL, NULL);

int* error_codes_in = (int*)malloc(num_elements * sizeof(int));

for (i = 0; i < num_elements; i++) {
  error_codes_in[i] = i;
}

error = clEnqueueWriteBuffer(command_queue, 开发者_高级运维log_buffer, CL_TRUE,
  0, num_elements * sizeof(int), error_codes_in, 0, NULL, NULL);

this works fine, and I get an array of numbers on the GPU and can manipulate them successfully, in parallel.

However, when I am using my own custom struct:

typedef struct {
  float position[2];
  float velocity[2];
  float radius;
  float resultant_force[2];
} ocl_element_2d_t;

(also defined in the kernel, as)

const char* kernel_string = 
  "typedef struct { float position[2]; float velocity[2]; float radius; float resultant_force[2]; } ocl_element_2d_t;"...

and I use the same/very similar code to write to the GPU version of my struct array:

cl_mem gpu_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE,
  num_elements * sizeof(ocl_element_2d_t), NULL, NULL);

error = clEnqueueWriteBuffer(command_queue, (cl_mem)gpu_buffer, CL_TRUE,
  0, num_elements * sizeof(ocl_element_2d_t), host_buffer, 0, NULL, NULL);

I get blank values in the GPU, and occasionally garbage (three or four values in 350,) for all of the float values inside the struct. Both return values are CL_SUCCESS.

Any suggestions as to where I'm going wrong? My only thought is that the GPU compiler produces a struct in memory with different gaps, and since the copy method ignores the internal structure of the items and just copies a continguous block of RAM, you end up with mismatches and possible out of phase items. Is it possible that my OS is 64-bit (OS X Lion) on an i7 (quad core), and my GPU is running 32-bit, and this is the problem? It's an ATI Radeon HD 5750, which has no double precision support, and claims to have a 128-bit bus (which may or may not be relevant, I don't know precisely what this stuff means.)

Is there a correct way to do this? Am I going to have to go all FORTRAN and have 7 different arrays, each with their own kernel argument, for the different properties in the struct?


All credit to @0A0D for being suspicious of my selective code samples. The problem was indeed in my failure to initialise the structs correctly.

My excuse is simply that I'm used to working with struct pointers, not structs, and so writing

ocl_element_2d_t element = host_buffer[i];
element.position[0] = 1.2;
element.position[1] = 5.7;

was the standard way to add properties to an object. Having had a quick google of structs, I came across a very very basic C tutorial, http://www.asic-world.com/scripting/structs_c.html which pointed out that

struct_instance = other_struct_instance;

performs a deep copy, not a reference copy.

Thus, when I tested the output from the local struct variable, the value I was expecting was there, and yet still nowhere near the array in host_buffer.

There are probably two lessons here:

  1. Make sure you post all the relevant code when asking a StackOverflow question - including all initialisation - so that all possible problems can be considered.
  2. When using a library, especially one as complicated as OpenCL, don't assume its developers will have made silly mistakes - they are almost certainly your own!


I'm not sure how your compilers aligns your 'float' structure, but using gcc you you can try:

#pragma pack(1)

to have it aligned without gaps.

To undo this packing use:

#pragma pack()

Also you might try to just rearrange the members, like this:

typedef struct {
  float position[2];
  float velocity[2];
  float resultant_force[2];
  float radius;
} ocl_element_2d_t;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜