OpenCL - iteratively updating GPU-resident buffer?
I need to have an OpenCL kernel iteratively update a buffer and return the results. To clarify:
- Send initial buffer to contents to the kernel
- Kernel/worker updates each element in the buffer
- Host code reads the results - HOPEFULLY asynchronously, though I'm not sure how to do this without blocking the kern开发者_如何转开发el.
- Kernel runs again, again updating each element, but the new value depends on the previous value.
- Repeat for some fixed number of iterations.
So far, I've been able to fake this by providing an input and output buffer, copying the output back to the input when the kernel finishes executing, and restarting the kernel. This seems like a huge waste of time and abuse of limited memory bandwidth as the buffer is quite large (~1GB).
Any suggestions/examples? I'm pretty new at OpenCL so this may have a very simple answer.
If it matters, I'm using Cloo/OpenCL.NET on an NVidia GTX460 and two GTX295s.
I recomend you to create a cl_mem in the device. Copy the data there. And iterate with the kernel. Use the same memory to store the results, that will be easyer for you, as your kernel will have just 1 parameter.
Then you just need to copy the data to the cl_mem, and run the kernel. After that, extract the data from the device, and run the kernel again.
If you don't care if this iteration can have some data from the next iteration. You can boost up a lot the performance, usign events, and OUT_OF_ORDER_QUEUE. This way the kernel can be running while you copy the data back.
You can write your initial data to the device and change its content with your kernel. As soon as the kernel is finished with its iteration you can read the same memory buffer back and restart the kernel for its next iteration. The data can stay on the OpenCL device. There is no need to send it again to the device.
There is not way, as far as I know, to synchronize the work between host and device. You can only start the kernel wait and for its return. Then read back the result and start again. Asynchronous read would be dangerous, because you could get inconsistent results.
精彩评论