GPU programming - transfer bottlenecks

2022-12-22 18:26 问答作者：

As I would like my GPU to do some of calculation for me, I am interested in the topic of measuring a speed of 'texture' upload and download - because my 'textures' are the data that GPU should crunch.

I know that transfer from main memory to GPU memory is the preffered way to go, so I expect such application to be efficient only if there is a lot of data to be processed and little result开发者_开发知识库s read back.

Anyway, any such benchmark application? I mean, for measuring main memory<>GPU transfer throughput...

EDIT (question clarification):

Once there was an application, which you started, and it gave out 2 numbers:

mb/s transfer rate between main memory and graphic card memory, from main TO graph, texture upload
mb/s transfer rate between main memory and graphic card memory, from graph TO main, texture download

I would just want to put my hands on that, again.

YET ANOTHER EDIT (found something):

Here http://www.benchmarkhq.ru/english.html?/be_mm.html (search for TexBench) is an app that measure the throughput ONE WAY...

To measure host to device memory bandwidth, you can use the bandwidthTest sample from the CUDA SDK (download from the CUDA site).

First: the difference between global (GPU) memory and texture is defined by cache. Textures have it, global memory - does not.

Second: the transfer rate from a host to a (GPU) device is the same for textures and for global memory.

Third: the transfer rate from a host to a (GPU) device varies with GPU generation and is determined by PCI-express bus and the size of your data.

See, for example: http://www.accelereyes.com/wiki/index.php?title=GPU_Memory_Transfer

you can use cuda profile to tell you time spent in cuda functions, including memory transfer time. You can write very simple transfer test case and measured that. this would be better in my opinion as you measure your particular test cases.

Lookup CUDA_PROFILE and how to use it. http://www.drdobbs.com/cpp/209601096?pgno=2

your question is a bit difficult to understand, do you want to measure transfer between host and GPU (texture cache is not really relevant than) or texture reads from within kernel?

继续阅读：benchmarking gpu

GPU programming - transfer bottlenecks

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？