CUDA: can different threads read from the same memory location simultaenously?
I am writing a CUDA program for the NVIDIA Tesla C2050 where each thread has to read a string of characters in an ordered fashion from position 0 to n-1. The string size is small so it can easily fit in constant, shared, or texture memory.
My question is: would different 开发者_StackOverflowthreads access the string simultaneously at the same time or serially? It seems this would affect the running time of my program.
The answer would be 'Yes and No'. It depends on the type of memory.
As @Jawad commented, the texture memory is cached but I'm not totally sure if the read is full simultaneously or it's serialized and read from the cache memory.
On the other hand, constant Memory is broadcast when threads in a half-warp read from the same location but It's serialized when reading from multiple location. This type of memory is also cached.
Finally, shared memory is serialized if some threads try to read the same bank of memory, a.k.a. bank conflict but *can broadcast to several threads simultaneously when servicing one memory read request*.
And It's also depend on the compute capability of your graphic card. I recommend you take a look at NVIDIA CUDA C Programming Guide (v.3.2 - chapter Appendix G., sections G.3 Compute Capability 1.x and G.4 Compute Capability 2.x).
Hope this help.
Map the memory as texture and the driver will automatically cache the reads i.e. if more than one threads try to read from a single global position there will be only one call to global memory.
精彩评论