Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

2023-01-06 22:52 问答作者：

Lets assume that I have a computer which has a multicore processor and a GPU. I wo开发者_如何学Culd like to write an OpenCL program which runs on all cores of the platform. Is this possible or do I need to choose a single device on which to run the kernel?

In theory yes, you can, the CL API allows it. But the platform/implementation must support it, and i don't think most CL implementatations do.

To do it, get the cl_device_id of the CPU device and the GPU device, and create a context with those two devices, using clCreateContext.

No you can't span automagically a kernel on both CPU and GPU, it's either one or the other.

You could do it but this will involve creating and managing manually two command queues (one for each device).

See this thread: http://devforums.amd.com/devforum/messageview.cfm?catid=390&threadid=124591&messid=1072238&parentid=0&FTVAR_FORUMVIEWTMP=Single

One context can only be for one platform. If your multi-device code needs to work across platforms (for example, Intel platform CPU OpenCL, and NVidia GPU) then you need separate contexts.

However, if the GPU and CPU happened to be in the same platform, then yes you could use one context.

If you are using multiple devices on the same platform (two identical GPUs, or two GPUs from the same manufacturer) then you can share the context - as long as they both come from a single clGetDeviceIDs call.

EDIT: I should add that a GPU+CPU context doesn't mean any automatically managed CPU+GPU execution. Typically, it is a best-practice to let the driver allocate a memory buffer that can be DMA'd by the GPU for maximum performance. In the case where you have the CPU and GPU in the same context, you'd be able to share those buffers across the two devices.

You still have to split the workload up yourself. My favorite load balancing technique is using events. Every n work items, attach an event object to a command (or enqueue a marker), and wait for the event that you set n workitems ago (the prior one). If you didn't have to wait, then you need to increase n on that device, if you did have to wait, then you should decrease n. This will limit the queue depth, n will hover around the perfect depth to keep the device busy. You need to do it anyway to avoid causing GUI render starvation. Just keep n commands in each command queue (where the CPU and GPU have separate n) and it will divide perfectly.

You cannot span a kernel to multiple devices. But if the code you a re running is not dependant on other results (ie: Procesing blocks of 16kB of data, that needs huge processing), you can launch the same kernel on GPU and CPU. And put some blocks on the GPU and some on the CPU.

That way it should boost up the performance.

You can do that, creating a clContext shared for CPU and GPU, and 2 command queues.

This is not aplicable to all the kernels. Some times the kernel code applies to all the input data, and is not able to be separated in parts or chunks.

继续阅读：gpgpu opencl

Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？