Combine several OpenCL buffers into a single large buffer
I h开发者_Go百科ave a 2D array that I have split up into several 1D arrays and made those 1D arrays into OpenCL buffers. Sometimes I need a kernel function to take the entire 2D array but since its size is determined at runtime I cannot just make enough kernel arguments as there are 1D arrays (plus there can be over 1000 1D arrays). I am hoping that there is some way I can take the 1D array holding OpenCL buffers and combine them into one large buffer that has the entire data and send it to my kernel. Right now the only way I can see of doing this is if I read the data from the 1D buffers back to my program, arrange them into a giant 1D array and write the new buffer back to my compute device this seems like it will be extremely slow, is there any other way?
Here are a couple of ideas (though I admit they are not ideal).
Instead of copying the buffers back to your program, then building new buffers from it, you can use the clEnqueueCopyBuffer()
method (or the clEnqueueCopyBufferRect()
, depending on your situation) to copy data from one buffer to another. I believe (but I wouldn't swear to it) that how this copy is performed is implementation dependent, but it seems that a buffer that resides in device memory could be copied to another buffer in device memory without the need to cross the bus back to host memory.
Of course (if I understand correctly), copying is not really what you wanted anyway. How about using the clCreateSubBuffer()
method? This method can make a new buffer that simply points to a sub-section of an existing buffer (without actually making a copy of its own). To do this, (from my understanding of what you've described) you would need to make the large 2D buffer, then create a number of light-weight 1D sub-buffers that point to regions of this memory.
In this way, you can pass the buffer that represents the whole 2D array when necessary, but just pass one or more 1D sub-buffers when that is all that is required.
I tested clCreateSubBuffer (whith his release) and a saw that it was slower than copy, better than create/ release, but ... :( System: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.212) FULL_PROFILE Radeon 5870
精彩评论