CUDA: Reasons for using preprocessing variables to specify the problem size

2023-03-18 00:39 问答作者：

I'm coding CUDA in Matlab mex-Files. When you look at CUDA examples on the internet or even manuals from nvidia, you often see the use of preprocessing variables to specify the problem size, e.g. the vector length for a vector addition or something like this. I coded my program also like this: Preprocessing Variables for specifying the problem size. And I have to admit it: I like it since you can access those everywhere in your code, e.g. as limits in a loop or somethin开发者_如何学Cg like this, without having to explicitly pass them via argument to the function.

But I ran into the following problem: I wanted to bench the program for several different problem sizes and thus I need to compile the code everytime again by passing the preprocessing-variable to the compiler. It's not a problem, I already coded the benchmark and it works. But I just wonder afterwards now, why I chose this version and did not simply specify it by a user input on runtime. And thus I'm looking for reasons one might want to use preprocessing variables instead of simply passing the problem size to the program.

Thanks!

When you compile-in problem-size constants in the kernel, then the compiler can make certain classes of optimizations that it can't if the sizes are only known at runtime. Full loop unrolling is an obvious example.

In other cases, for instance shared memory array sizes, it is a lot clearer if the sizes are compiled-in; otherwise you have to pass in the total shared memory size at kernel launch time and break that memory up into the number of shared arrays you need. That works fine, but the code is much clearer if you can just have static declarations, for which you need the compile-time sizes.

The main reason is that in general the problem size will be intimately linked to the GPU architecture, e.g. number of threads per block, number of blocks, amount of shared memory per thread, number of registers per thread, etc. In general these numbers are all carefully hand tuned to get the maximum usage of available resources and you can't easily change the problem size dynamically while still maintaining optimum performance.

继续阅读：preprocessor

CUDA: Reasons for using preprocessing variables to specify the problem size

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？