开发者

Improving memory usage in an array-wide filter to avoid block-processing

I am implementing some satellite image filters, starting with one known as the Enhanced Lee filter. The images are easily up to 5000x5000 pixels and more. My current implementation is running out of memory trying to compute the filters on those large arrays (note that the moving average and moving stddev filters can be run in one shot). The main difficulty is the number of arrays that must be kept in memory in order to return the final filtered array. In this question, I asked for help on refining a block-processing function, but my question is: is there a way of improving this code so that I don't need to use block-processing?

    def moving_average(Ic, filtsize):
        Im = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
        return Im

    def moving_stddev(Ic, filtsize):
        Im = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
        S = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(((Ic-Im) ** 2), filtsize, output=S)
        return numpy.sqrt(S)

    def enh_lee(Ic, filtsize, nlooks, dfactor):
        # Implementation based on PCI Geomatica's FELEE function documentation
        Ci = moving_stddev(Ic, filtsize) / moving_average(Ic, filtsize) #1st array in memory
        Cu = numpy.sqrt(1 / nlooks) #scalar
        Cmax = numpy.sqrt(1 + (2 * nlooks)) #scalar
        W = numpy.exp(-dfactor * (Ci - Cu) / (Cmax - Ci)) #2nd array in memory
        Im = moving_average(Ic, filtsize) #3rd array in memory
        If = Im * W + Ic * (1 - W) #4th array in memory
        W = None # Back to 3 arrays in memory
        return numpy.select([Ci <= Cu, (Cu < Ci) * (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])

where nlooks and dfactor are scalars and Ic is the unfiltered array.

EDIT bas开发者_如何学Goed on your suggestions (I am also looking at numexpr), my improved code for enh_lee is as follows, but is still not enough to get past the last step without running out of memory:

def enh_lee(Ic, filtsize, nlooks, dfactor):
    Im = moving_average(Ic, filtsize)
    Ci = moving_stddev(Ic, filtsize)
    Ci /= Im
    Cu = numpy.sqrt(1 / nlooks)
    Cmax = numpy.sqrt(1 + (2 * nlooks))

    W = Ci
    W -= Cu
    W /= Cmax - Ci
    W *= -dfactor
    numpy.exp(W, W)

    If = 1
    If -= W
    If *= Ic
    If += Im * W
    W = None
    return numpy.select([Ci <= Cu, (Cu < Ci) & (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])


There are several (memory usage) optimizations you can make here... A few tricks to keep in mind are:

  1. Most numpy functions take an out parameter than can be used to specify the output array instead of returning a copy. E.g. np.sqrt(x, x) will take the square root of an array in-place.
  2. x += 1 uses half the memory that x = x + 1 does, as the latter makes a temporary copy. When it's possible, try to split calculations up into *=, +=, /=, etc. When it's not, use numexpr, as @eumiro suggested. (Or just used numexpr regardless... It's quite handy in many cases.)

So, first off, here's your original function's performance with a 10000x10000 array of random data and a filtsize of 3:

Memory Usage Profile of Original Function

Improving memory usage in an array-wide filter to avoid block-processing

What's interesting are the big spikes at the end. These occur during your numpy.select(...) bit. There a plenty of places where you're inadvertently creating additional temporary arrays, but they're mostly irrelevant, as they're overwhelmed by what goes on during the call to select.


At any rate, if we replace your original (clean and compact) code with this rather verbose version, you can significantly optimize your memory usage:

import numpy
import scipy.ndimage

def main(x=None):
    if x is None:
        ni, nj = 10000, 10000
        x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    x = enh_lee(x, filtsize, nlooks, dfactor)
    return x

def moving_average(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    return Im

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im *= -1
    Im += Ic
    Im **= 2
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    return numpy.sqrt(Im, Im)

def enh_lee(Ic, filtsize, nlooks, dfactor):
    # Implementation based on PCI Geomatica's FELEE function documentation
    Ci = moving_stddev(Ic, filtsize) 
    Im = moving_average(Ic, filtsize) 
    Ci /= Im

    Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32) 
    Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32) 

    W = Ci.copy()
    W -= Cu
    W *= -dfactor
    W /= Cmax - Ci
    W = numpy.exp(W, W)

    If = Im * W
    W *= -1
    W += 1
    W *= Ic
    If += W
    del W

    # Replace the call to numpy.select
    out = If
    filter = Ci <= Cu
    numpy.putmask(out, filter, Im)
    del Im

    filter = Ci >= Cmax
    numpy.putmask(out, filter, Ic)
    return out

if __name__ == '__main__':
    main()

Here's the resulting memory profile for this code:

Memory Usage Profile of Numpy-based Optimized Version

Improving memory usage in an array-wide filter to avoid block-processing

So, we've greatly reduced memory usage, but the code is somewhat less readable (i.m.o.).

However, those last three peaks are the two numpy.where calls...

If numpy.where took an out parameter, we could further reduce the peak memory usage by another ~300Mb or so. Unfortunately, it doesn't, and I don't know of a more memory-efficient way to do it...

We can use numpy.putmask to replace the call to numpy.select and do the operation in-place (Thanks to @eumiro for mentioning this in an entirely different question.)

If we optimize things with numexpr, we get considerably cleaner code (compared to the pure-numpy version above, not the original). You could probably whittle the memory usage down a bit in this version... I'm not terribly familiar with numexpr, beyond having used it a few times.

import numpy
import scipy.ndimage
import numexpr as ne

def main(x=None):
    if x is None:
        ni, nj = 10000, 10000
        x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    x = enh_lee(x, filtsize, nlooks, dfactor)
    return x

def moving_average(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    return Im

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im = ne.evaluate('((Ic-Im) ** 2)')
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    return ne.evaluate('sqrt(Im)')

def enh_lee(Ic, filtsize, nlooks, dfactor):
    # Implementation based on PCI Geomatica's FELEE function documentation
    Ci = moving_stddev(Ic, filtsize) 
    Im = moving_average(Ic, filtsize) 
    Ci /= Im

    Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32) 
    Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32) 

    W = ne.evaluate('exp(-dfactor * (Ci - Cu) / (Cmax - Ci))') 
    If = ne.evaluate('Im * W + Ic * (1 - W)') 
    del W

    out = ne.evaluate('where(Ci <= Cu, Im, If)') 
    del Im
    del If

    out = ne.evaluate('where(Ci >= Cmax, Ic, out)')
    return out

if __name__ == '__main__':
    main()

And here's the memory-usage profile for the numexpr version: (Note that the execution time has been more than halved compared to the original!)

Memory Usage Profile of Numexpr-based Optimized Version*

Improving memory usage in an array-wide filter to avoid block-processing

The largest memory usage is still during the calls to where (replacing the call to select). However, the peak memory usage has been significantly cut. The easiest way to further reduce it would be to find some way to select operate in-place on one of the arrays. It would be fairly easy to do this with cython (nested loops would be rather slow in pure python, and any sort of boolean indexing in numpy will create an additional copy). You may be better off by simply chunking the input array as you've been doing, though...

Just as as side note, the updated versions produce the same output as the original code. There was a typo in the original numpy-based code...


You are counting your arrays in memory, but in reality there are more of them. Have a look at numexpr, the text on the title page will explain you what happens and the package could help you solving your problem.


As an example how to reduce memory usage, let's have a look at your moving_stddev() function. Expressions like

((Ic-Im) ** 2)

will create temporary arrays, but they can be avoided:

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im -= Ic
    Im **= 2
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    numpy.sqrt(Im, Im)
    return Im

Note that numpy.sqrt() and scipy.ndimage.filters.uniform_filter() work fine with using the same input and output array. Be careful with these constructs, though, because sometimes this has unexpected side effects.

There is a nice answer by Joe Kington detailing more ways to save memory.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜