Is there a way to get a view into a python array.array()?

2023-02-26 18:56 问答作者：

I'm generating many largish 'random' files (~500MB) in which the contents are the output of repeated calls to random.randint(...). I'd like to preallocate a large buffer, write longs to that buffer, and periodically flush that buffer to disk. I am currently using array.array() but I can't see a way to create a view into this buffer. I need to do this so that I can fee开发者_高级运维d the part of the buffer with valid data into hashlib.update(...) and to write the valid part of the buffer to the file. I could use the slice operator but AFAICT that creates a copy of the buffer, which isn't what I want.

Is there a way to do this that I'm not seeing?

Update:

I went with numpy as user42005 and hgomersall suggested. Unfortunately this didn't give me the speedups I was looking for. My dirt-simple C program generates ~700MB of data in 11s, while my python equivalent using numpy takes around 700s! It's hard to believe that that's the difference in performance between the two (I'm more likely to believe that I made a naive mistake somewhere...)

I guess you could use numpy: http://www.numpy.org - the fundamental array type in numpy at least supports no-copy views.

Numpy is incredibly flexible and powerful when it comes to views into arrays whilst minimising copies. For example:

import numpy
a = numpy.random.randint(0, 10, size=10)
b = numpy.a[3:10]

b is now a view of the original array that was created.

Numpy arrays allow all manner of access directly to the data buffers, and can be trivially typecast. For example:

a = numpy.random.randint(0, 10, size=10)
b = numpy.frombuffer(a.data, dtype='int8')

b is now view into the memory with the data all as 8-bit integers (the data itself remains unchanged, so that each 64-bit int now becomes 8 8-bit ints). These buffer objects (from a.data) are standard python buffer objects and so can be used in all the places that are defined to work with buffers.

The same is true for multi-dimensional arrays. However, you have to bear in mind how the data lies in memory. For example:

a = numpy.random.randint(0, 10, size=(10, 10))
b = numpy.frombuffer(a[3,:].data, dtype='int8')

will work, but

b = numpy.frombuffer(a[:,3].data, dtype='int8')

returns an error about being unable to get single-segment buffer for discontiguous arrays. This problem is not obvious because simply allocating that same view to a variable using

b  = a[:,3]

returns a perfectly adequate numpy array. However, it is not contiguous in memory as it's a view into the other array, which need not be (and in this case isn't) a view of contiguous memory. You can get info about the array using the flags attribute on an array:

a[:,3].flags

which returns (among other things) both C_CONTIGUOUS (C order, row major) and F_CONTIGUOUS (Fortran order, column major) as False, but

a[3,:].flags

returns them both as True (in 2D arrays, at most one of them can be true).

继续阅读：python

Is there a way to get a view into a python array.array()?

更多精彩内容

精彩评论

最新问答

大家觉得三星电视怎么样?？

电动幕布挂不平会不会有皱纹？

海信激光电视视距是多少,客厅大小怎么匹配?？

如何打开屏幕镜像？

检查输卵管堵了哪家医院好？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？