开发者

Usage of global vs. constant memory in CUDA

Hey there, I have the following piece of code:

#if USE_CONST == 1
    __constant__ double PNT[ SIZE ];    
#else
    __device__ double *PNT;
#endif

and a bit later I have:

#if USE_CONST == 0
    cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
    cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
    cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif

whereas point is somewhere defined in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than it 开发者_如何学Cdoesn't. I access the array in my kernel-function via

PNT[ index ]

Where's the problem between the both variants? Thanks!


The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:

cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)

or alternatively:

double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

which might be a bit faster if you are planning to access the symbol from the host API more than once.

EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory

double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE.  cudaMemcpyHostToDevice););


Although this is an old question I add this for future googlers:

The problem is here:

cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

The cudaMalloc writes to the host version of PNT which is actually a device variable that must not be accessed from host. So correct would be to allocate memory, copy the address to the device symbol and copy the memory to the the memory pointed to by that symbol:

void* memPtr;
cudaMalloc(&memPtr, sizeof(double)*SIZE);
cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr));
// In other places you'll need an additional:
// cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr));
cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

Easier would be:

#if USE_CONST == 1
    __constant__ double PNT[ SIZE ];    
#else
    __device__ double PNT[ SIZE ];
#endif

// No #if required anymore:
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜