开发者

CUDA and templates: specialization declaration needed?

I have a templated wrapper function that calls a kernel (__global__) defined in a .cu file like this

template<typename T, class M> 
__global__ void compute_kernel(T* input, T* output, n) {
    M m;
    // compute stuff using m
};

template<typename T, class M> 
void compute(T* input, T* output, int n) {
    // ... compute blocks, threads, etc.
    compute_kernel<T,M> <<<dim_grid, dim_block>>>(input, output, n);
    // ...
};

and a header file to be included in host code that has only the declaration

template<typename T, class M> 
void compute(T* input, T* output, int n);

However, calling compute() from the host with arbitrary template parameters, the compilation fails with undefined reference to 'void reduce(...)' and only if I add specialization declarations to the end of the .cu file does the code compile:

template void
compute<int, Method1<int> >(int* input, int* output, int n);

template void
compute<float, Method1<float> >(float* input, float* output, int n);

template void
compute<int, Method2<int> >(int* input, int* output, int n);

template void
compute<float, Method2<float> >(float* input, float*开发者_C百科 output, int n);

So, is it necessary to specialize every templated function in order to make it callable from the host? (That's quite a drawback)

Thanks for your comments!


This is a C++ FAQ, not limited to CUDA.

If you have a template implementation in a .cpp or .cu file then when you compile that translation unit the compiler cannot possibly know what permutations of template parameters you will need. Therefore when you link you will get the errors.

You could put the implementation in a header file (in which case you'll need to instantiate in a .cu file since it contains CUDA) or you will have to explicitly instantiate all required permutations. If you have to do many of these then you could use a macro to instantiate all your permutations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜