CUDA - what if I choose too many blocks?

2023-02-20 09:04 问答作者：

I'm still getting mad on these unknown-size matrices which may vary from 10-20.000 for each dimension.

I'm looking at the CUDA sdk and wondering: what if I choose a number of blocks too high?

Something like a grid of 9999 x 9999 blocks in the X and Y dimensions, if my hardware has SMs which can't hold all these blocks, will the kernel have problems or the performances would simply collapse?

I don't know how to dimension in blocks/threads something which may vary so much.. I'm thinking at using the MAXIMUM number of blocks my hardware supports and then making the threads inside them work across all th开发者_运维技巧e matrix, is this the right way?

The thread blocks do not have a one to one mapping with the cores. Blocks are scheduled to cores as they become available, meaning you can request as many as you want (up to a limit probably). Requesting a huge number of blocks would just slow the system down as it loads and unloads do-nothing thread blocks to the cores.

You can specify the dimensions of the grid and blocks at run time.

Edit: Here are the limits on the dimensions of the grid and the blocks, from the documentation.

CUDA - what if I choose too many blocks?

If you choose an excessively large block size, you waste some cycles while the "dead" blocks get retired (typically only of the order of a few tens of microseconds even for the maximum grid size on a "full size" Fermi or GT200 card). It isn't a huge penalty.

But the grid dimension should always be computable a priori. Usually there is a known relationship between a quantifiable unit of data parallel work - something like one thread per data point, or one block per matrix column or whatever - which allows the required grid dimensions to be calculated at runtime.

An alternative strategy would be to use a fixed number of blocks (usually only needs to be something like 4-8 per MP on the GPU) and have each block/thread process multiple units of parallel work, so each block becomes "persistent". If there is a lot of fixed overhead costs in setup per thread, it can be a good way to amortize those fixed overheads across more work per thread.

继续阅读：matrix

CUDA - what if I choose too many blocks?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？