Usage of global vs. constant memory in CUDA
Hey there, I have the following piece of code:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double *PNT;
#endif
and a bit later I have:
#if USE_CONST == 0
cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif
whereas point
is somewhere defined in the code before. When working with USE_CONST=1
everything works as expected, but when working without it, than it 开发者_如何学Cdoesn't. I access the array in my kernel-function via
PNT[ index ]
Where's the problem between the both variants? Thanks!
The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:
cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)
or alternatively:
double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
which might be a bit faster if you are planning to access the symbol from the host API more than once.
EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory
double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE. cudaMemcpyHostToDevice););
Although this is an old question I add this for future googlers:
The problem is here:
cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
The cudaMalloc
writes to the host version of PNT
which is actually a device variable that must not be accessed from host. So correct would be to allocate memory, copy the address to the device symbol and copy the memory to the the memory pointed to by that symbol:
void* memPtr;
cudaMalloc(&memPtr, sizeof(double)*SIZE);
cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr));
// In other places you'll need an additional:
// cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr));
cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
Easier would be:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double PNT[ SIZE ];
#endif
// No #if required anymore:
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
精彩评论