CUDA statically allocating data on device
I've been trying to allocate a variable that can be accessed by each kernel function. My attempt is the code attached below, but it won't compile cause the dArray can't be viewed accessed by the kernel. In C开发者_开发知识库++ you would place the variable at the top or declare static to be accessed in every scope through out the program.
__global__ void StoreThreadNumber()
{
dArray[threadIdx.x] = threadIdx.x;
}
int main( int argc, char** argv)
{
unsigned __int8 Array[16] = { 0 };
unsigned __int8 dArray[16];
for( __int8 Position = 0; Position < 16; Position++)
cout << Array[Position] << " ";
cout << endl;
cudaMalloc((void**) dArray, 16*sizeof(__int8));
cudaMemcpy( dArray, Array, 16*sizeof(__int8), cudaMemcpyHostToDevice);
StoreThreadNumber<<<1, 16>>>();
cudaMemcpy( Array, dArray, 16*sizeof(__int8), cudaMemcpyDeviceToHost);
for( __int8 Position = 0; Position < 16; Position++)
cout << Array[Position] << " ";
cout << endl;
cudaFree(dArray);
}
You can have global variables in CUDA, of type __device__
or __constant__
. So, for example, if you initialize a __constant__
pointer variable to the address of a device pointer using cudaMemcpyToSymbol()
, you can then access that pointer via the __constant__
variable:
__constant__ int* dArrayPtr;
__global__ void StoreThreadNumber()
{
dArrayPtr[threadIdx.x] = threadIdx.x;
}
Just make sure you correctly initialize dArrayPtr from your host code before you run the kernel.
You can't. You have to pass a pointer to dArray to the kernel.
I had the same problem having to pass along a lot of global data to the gpu. I ended up wrapping it all up in a struct and passing around a pointer to it.
精彩评论