GPU programming - transfer bottlenecks
As I would like my GPU to do some of calculation for me, I am interested in the topic of measuring a speed of 'texture' upload and download - because my 'textures' are the data that GPU should crunch.
I know that transfer from main memory to GPU memory is the preffered way to go, so I expect such application to be efficient only if there is a lot of data to be processed and little result开发者_开发知识库s read back.
Anyway, any such benchmark application? I mean, for measuring main memory<>GPU transfer throughput...
EDIT (question clarification):
Once there was an application, which you started, and it gave out 2 numbers:
mb/s transfer rate between main memory and graphic card memory, from main TO graph, texture upload
mb/s transfer rate between main memory and graphic card memory, from graph TO main, texture download
I would just want to put my hands on that, again.
YET ANOTHER EDIT (found something):
Here http://www.benchmarkhq.ru/english.html?/be_mm.html (search for TexBench) is an app that measure the throughput ONE WAY...
To measure host to device memory bandwidth, you can use the bandwidthTest
sample from the CUDA SDK (download from the CUDA site).
First: the difference between global (GPU) memory and texture is defined by cache. Textures have it, global memory - does not.
Second: the transfer rate from a host to a (GPU) device is the same for textures and for global memory.
Third: the transfer rate from a host to a (GPU) device varies with GPU generation and is determined by PCI-express bus and the size of your data.
See, for example: http://www.accelereyes.com/wiki/index.php?title=GPU_Memory_Transfer
you can use cuda profile to tell you time spent in cuda functions, including memory transfer time. You can write very simple transfer test case and measured that. this would be better in my opinion as you measure your particular test cases.
Lookup CUDA_PROFILE and how to use it. http://www.drdobbs.com/cpp/209601096?pgno=2
your question is a bit difficult to understand, do you want to measure transfer between host and GPU (texture cache is not really relevant than) or texture reads from within kernel?
精彩评论