开发者

PyCUDA Memory Addressing: Memory offset?

I've got a large chunk of generated data (A[i,j,k]) on the device, but I only need one 'slice' of A[i,:,:], and in regular CUDA this could be easily accomplished with some pointer arithmetic.

Can the same thing be done within pycuda? i.e

cuda.memcpy_dtoh(h_iA,d_A+(i*stride))

Obviously this is completely wrong since开发者_运维百科 theres no size information (unless inferred from the dest shape), but hopefully you get the idea?


The pyCUDA gpuArray class supports slicing of 1D arrays, but not higher dimensions that require a stride (although it is coming). You can, however, get access to the underlying pointer in a multidimensional gpuArray from the gpuarray member, which is a pycuda.driver.DeviceAllocation type, and the size information from the gpuArray.dtype.itemsize member. You can then do the same sort of pointer arithmetic you had in mind to get something that the driver memcpy functions will accept.

It isn't very pythonic, but it does work (or at least it did when I was doing a lot of pyCUDA + MPI hacking last year).


Is unlikely that is implemented in PyCuda.

I can think to the following solutions:

  1. Copy the entire Array A in memory and make a numpy array from the interested slice.
  2. Create a Kernel that read the matrix and creates the desired slice.
  3. Rearrange the Produced Data in a way that you can read a slice at a time from pointer arithmetic.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜