Is there some kind of incompatibility with Boost::thread() and Nvidia CUDA?

2023-03-07 16:19 问答作者：

I'm developing a generic streaming CUDA kernel execution Framework that allows parallel data copy & execution on the GPU.

Currently I'm calling the cuda kernels within a C++ static function wrapper, so I can call the kernels from a .cpp file (not .cu), like this:

//kernels.cu:

//kernel definition
__global__ void kernelCall_kernel(  dataRow* in,  dataRow* out,  void* additionalData){
    //Do something
};

//kernel handler, so I can compile this .cu and link it with the main project and call it within a .cpp file
extern "C" void kernelCall( dataRow* in,  dataRow* out,  void* additionalData){ 
    int blocksize = 256;  
    dim3 dimBlock(blocksize);
    dim3 dimGrid(ceil(tableSize/(float)blocksize)); 
    kernelCall_kernel<<<dimGrid,dimBlock>>>(in, out, additionalData);   

}

If I call the handler as a normal function, the data printed is right.

//streamProcessing.cpp
//allocations and definitions of data omitted

//copy data to GPU
cudaMemcpy(data_d,data_h,tableSize,cudaMemcpyHostToDevice);
//call:
kernelCall(data_d,result_d,null);
//copy data back
cudaMemcpy(result_h,result_d,resultSize,cudaMemcpyDeviceToHost);
//show result:
printTable(result_h,resultSize);// this just iterate and shows the data

But to 开发者_JAVA百科allow parallel copy and execution of data on the GPU I need to create a thread, so when I call it making a new boost::thread:

//allocations, definitions of data,copy data to GPU omitted
//call:
boost::thread* kernelThreadOwner = new boost::thread(kernelCall, data_d,result_d,null); 
kernelThreadOwner->join();
//Copy data back and print ommited

I just get garbage when printing the result on the end.

Currently I'm just using one thread, for testing purpose, so there should be no much difference in calling it directly or creating a thread. I have no clue why calling the function directly gives the right result, and when creating a thread not. Is this a problem with CUDA & boost? Am I missing something? Thank you in advise.

The problem is that (pre CUDA 4.0) CUDA contexts are tied to the thread in which they were created. When you are using two threads, you have two contexts. The context that the main thread is allocating and reading from, and the context that the thread which runs the kernel inside are not the same. Memory allocations are not portable between contexts. They are effectively separate memory spaces inside the same GPU.

If you want to use threads in this way, you either need to refactor things so that one thread only "talks" to the GPU, and communicates with the parent via CPU memory, or use the CUDA context migration API, which allows a context to be moved from one thread to another (via cuCtxPushCurrent and cuCtxPopCurrent). Be aware that context migration isn't free, and there is latency involved, so if you plan to migrating contexts around frequently, you might find it more efficient to change to a different design which preserves context-thread affinity.

继续阅读：gpgpu multithreading nvidia

Is there some kind of incompatibility with Boost::thread() and Nvidia CUDA?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？