Numpy.mean, amin, amax, std huge returns

2023-03-25 23:05 问答作者：

I am struggling working with large numpy arrays. Here is the scenario. I am working with 300MB - 950MB images and using GDAL to read them as Numpy arrays. Reading in the array uses exactly as much memory as one would expect, ie. 250MB for a 250MB image, etc...

My problem occurs when I use numpy to get the mean, min, max, or standard deviation. In main() I open the image and read the array (type ndarray). I then call the following function, to开发者_如何学编程 get the standard deviation, on a 2D array:

def get_array_std(input_array):
    array_standard_deviation = numpy.std(input_array, copy=False)
    return array_standard_deviation

Here I am constantly having memory errors (on a 6GB machine). From the documentation it looks like numpy is returning an ndarray with the same shape and dtype as my input, thereby doubling the in memory size.

Using:

print type(array_standard_deviation)

Returns:

numpy.float64

Additionally, using:

print array_standard_deviation

Returns a float std as one would expect. Is numpy reading the array in again to perform this calculation? Would I be better off iterating through the array and manually performing the calculation(s)? How about working with a flattened array?

I have tried placing each statistic call (numpy.amin(), numpy.amax(), numpy.std(), numpy.mean()) into their own function so that the large array would go out of scope, but no luck there. I have also tried casting the return to another type, but no joy.

Numpy does a "naive" reduce operation for std. It is quite memory inefficient. Look here for a better implementation: http://luispedro.org/software/ncreduce

Don't know if this is helpful, but does using the array method resolve the issue? i.e.

input_array.std()

instead of

numpy.std(input_array)

The problem you describe doesn't make a whole lot of sense to me; I work with large arrays often but don't encounter errors with simple tasks like these. Is there anything else you're doing that might end up passing copies of the arrays instead of references?

Are you sure this is a problem with all of the statistics functions you're trying, or is it just np.std?

I've tried the following method to reproduce this:

Start ipython -cs 0, import numpy as nd
q = rand(5600,16000), giving me a nice large test array.
Watch memory usage externally during np.mean(q), np.amin(q), np.amax(q), np.std(q)

Of these, np.std is significantly slower: most functions take 0.2 seconds on my computer, whereas std takes 2.3. While I don't have the exact memory leak you have, my memory usage stays mostly constant while running everything except std, but doubles when I run std, and then goes back down to the initial amount.

I've written the following modified std, which operates on chunks of a given number of elements (I'm using 100000):

def chunked_std( A, chunksize ):
    Aflat = A.ravel()
    Amean = A.mean()
    Alen = len(Aflat)

    i = np.concatenate( ( np.arange(0,Alen,chunksize), [Alen] ) )

    return np.sqrt(np.sum(np.sum(abs(Aflat[x:y]-Amean)**2) for (x,y) in zip(i[:-1],i[1:]))/Alen)

This seems to significantly reduce memory usage, while also being about twice as fast as normal np.std for me. There are probably significantly more elegant ways of writing such a function, but this seems to work.

继续阅读：arrays image-processing numpy

Numpy.mean, amin, amax, std huge returns

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？