CUDA device pointer manipulation
I've used:
float *devptr;
//...
cud开发者_如何学GoaMalloc(&devptr, sizeofarray);
cudaMemcpy(devptr, hostptr, sizeofarray, cudaMemcpyHostToDevice);
in CUDA C to allocate and populate an array. Now I'm trying to run a cuda kernel, e.g.:
__global__ void kernelname(float *ptr)
{
//...
}
in that array but with an offset value. In C/C++ it would be someting like this:
kernelname<<<dimGrid, dimBlock>>>(devptr+offset);
However, this doesn't seem to work.
Is there a way to do this without sending the offset value to the kernel in a separate argument and use that offset in the kernel code? Any ideas on how to do this?
Pointer arithmetic does work just fine in CUDA. You can add an offset to a CUDA pointer in host code and it will work correctly (remembering the offset isn't a byte offset, it is a plain word or element offset).
EDIT: A simple working example:
#include <cstdio>
int main(void)
{
const int na = 5, nb = 4;
float a[na] = { 1.2, 3.4, 5.6, 7.8, 9.0 };
float *_a, b[nb];
size_t sza = size_t(na) * sizeof(float);
size_t szb = size_t(nb) * sizeof(float);
cudaFree(0);
cudaMalloc((void **)&_a, sza );
cudaMemcpy( _a, a, sza, cudaMemcpyHostToDevice);
cudaMemcpy( b, _a+1, szb, cudaMemcpyDeviceToHost);
for(int i=0; i<nb; i++)
printf("%d %f\n", i, b[i]);
cudaThreadExit();
}
Here, you can see a word/element offset has been applied to the device pointer in the second cudaMemcpy
call to start the copy from the second word, not the first.
Pointer arithmetic does work on host side code, it's used fairly often in the example code provided by nvidia.
"Linear memory exists on the device in a 40-bit address space, so separately allocated entities can reference one another via pointers, for example, in a binary tree."
Read more at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#ixzz4KialMz00
And from the performance primitives (npp) documentation, a perfect example of pointer arithmetic.
"4.5.1 Select-Channel Source-Image Pointer This is a pointer to the channel-of-interest within the first pixel of the source image. E.g. if pSrc is the pointer to the first pixel inside the ROI of a three channel image. Using the appropriate select-channel copy primitive one could copy the second channel of this source image into the first channel of a destination image given by pDst by offsetting the pointer by one: nppiCopy_8u_C3CR(pSrc + 1, nSrcStep, pDst, nDstStep, oSizeROI);"
*Note: this works without multiplying by the number of bytes per data element because the compiler is aware of the data type of the pointer, and calculates the address accordingly.
In C and C++, pointer arithmetic can be accomplished as above or by the notation &ptr[offset] (to return device memory address of data instead of value, value will not work on device memory from host side code). When using either notation the size of the data type is automatically handled, and the offset is specified as a number of data elements rather than bytes.
精彩评论