Pointer arithmetic with shared memory
I don't understand what exactly happens in the following lines:
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
andfloat *clusters = (float *)(sharedMemory + blockDim.x);
I assume that in #1 sharedMemory
is effectively renamed into membershipChanged
, but why would you add the blockDim
to the sharedMemory
pointer. Where does this ad开发者_开发技巧dress point?
sharedMemory
was created with extern __shared__ char sharedMemory[];
The code I found in a CUDA kmeans implementation.
void find_nearest_cluster(int numCoords,
int numObjs,
int numClusters,
float *objects, // [numCoords][numObjs]
float *deviceClusters, // [numCoords][numClusters]
int *membership, // [numObjs]
int *intermediates)
{
extern __shared__ char sharedMemory[];
// The type chosen for membershipChanged must be large enough to support
// reductions! There are blockDim.x elements, one for each thread in the
// block.
unsigned char *membershipChanged = (unsigned char *)sharedMemory;
float *clusters = (float *)(sharedMemory + blockDim.x);
membershipChanged[threadIdx.x] = 0;
// BEWARE: We can overrun our shared memory here if there are too many
// clusters or too many coordinates!
for (int i = threadIdx.x; i < numClusters; i += blockDim.x) {
for (int j = 0; j < numCoords; j++) {
clusters[numClusters * j + i] = deviceClusters[numClusters * j + i];
}
}
.....
sharedMemory + blockDim.x
points blockDim.x
bytes away from the base of the shared memory region.
The reason you might do something like this is to suballocate in shared memory. The launch site of the kernel which includes find_nearest_cluster
dynamically allocates some amount of shared storage for the kernel. The code implies that two logically different arrays reside in the shared storage pointed to by sharedMemory
-- membershipChanged
, and clusters
. The pointer arithmetic is simply a means to get a pointer to the second array.
精彩评论