开发者

PyCUDA: Querying Device Status (Memory specifically)

PyCUDA's documentation mentions Driver Interface calls in passing, but I'm a bit think and can't see how to get information such as 'SHARED_SIZE_BYTES' out of my code.

Can anyone point me to any examples of querying the device in this way?

Is it possible to / How do I check the device state (eg between malloc/memcpy and kernel launch) to implem开发者_如何转开发ent some machine-dynamic operations? (I want to be able to deal with devices that support multiple kernels in a 'friendly' way.


Just for anyone else coming across this, spending half an hour with the CUDA API in one hand, and the PyCUDA documentation in another does wonders. Its much simpler than my initial experiments indicated.

Runtime Kernel Info

Incoming lazy lazy code

...
kernel=mod.get_function("foo")
meminfo(kernel)
...
def meminfo(kernel):
    shared=kernel.shared_size_bytes
    regs=kernel.num_regs
    local=kernel.local_size_bytes
    const=kernel.const_size_bytes
    mbpt=kernel.max_threads_per_block
    print("=MEM=\nLocal:%d,\nShared:%d,\nRegisters:%d,\nConst:%d,\nMax Threads/B:%d" % (local,shared,regs,const,mbpt))

Example Output

=MEM=
Local:24,
Shared:64,
Registers:18,
Const:0,
Max Threads/B:512    

Static Device Info

Incoming lazy lazy code

import pycuda.autoinit
import pycuda.driver as cuda

(free,total)=cuda.mem_get_info()
print("Global memory occupancy:%f%% free"%(free*100/total))

for devicenum in range(cuda.Device.count()):
    device=cuda.Device(devicenum)
    attrs=device.get_attributes()

    #Beyond this point is just pretty printing
    print("\n===Attributes for device %d"%devicenum)
    for (key,value) in attrs.iteritems():
        print("%s:%s"%(str(key),str(value)))

Example Output

Global memory occupancy:70.000000% free

===Attributes for device 0
MAX_THREADS_PER_BLOCK:512
MAX_BLOCK_DIM_X:512
MAX_BLOCK_DIM_Y:512
MAX_BLOCK_DIM_Z:64
MAX_GRID_DIM_X:65535
MAX_GRID_DIM_Y:65535
MAX_GRID_DIM_Z:1
MAX_SHARED_MEMORY_PER_BLOCK:16384
TOTAL_CONSTANT_MEMORY:65536
WARP_SIZE:32
MAX_PITCH:2147483647
MAX_REGISTERS_PER_BLOCK:8192
CLOCK_RATE:1500000
TEXTURE_ALIGNMENT:256
GPU_OVERLAP:1
MULTIPROCESSOR_COUNT:14
KERNEL_EXEC_TIMEOUT:1
INTEGRATED:0
CAN_MAP_HOST_MEMORY:1
COMPUTE_MODE:DEFAULT
MAXIMUM_TEXTURE1D_WIDTH:8192
MAXIMUM_TEXTURE2D_WIDTH:65536
MAXIMUM_TEXTURE2D_HEIGHT:32768
MAXIMUM_TEXTURE3D_WIDTH:2048
MAXIMUM_TEXTURE3D_HEIGHT:2048
MAXIMUM_TEXTURE3D_DEPTH:2048
MAXIMUM_TEXTURE2D_ARRAY_WIDTH:8192
MAXIMUM_TEXTURE2D_ARRAY_HEIGHT:8192
MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES:512
SURFACE_ALIGNMENT:256
CONCURRENT_KERNELS:0
ECC_ENABLED:0
PCI_BUS_ID:1
PCI_DEVICE_ID:0
TCC_DRIVER:0
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜