Improving memory usage in an array-wide filter to avoid block-processing
I am implementing some satellite image filters, starting with one known as the Enhanced Lee filter. The images are easily up to 5000x5000 pixels and more. My current implementation is running out of memory trying to compute the filters on those large arrays (note that the moving average and moving stddev filters can be run in one shot). The main difficulty is the number of arrays that must be kept in memory in order to return the final filtered array. In this question, I asked for help on refining a block-processing function, but my question is: is there a way of improving this code so that I don't need to use block-processing?
def moving_average(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
return Im
def moving_stddev(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
S = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(((Ic-Im) ** 2), filtsize, output=S)
return numpy.sqrt(S)
def enh_lee(Ic, filtsize, nlooks, dfactor):
# Implementation based on PCI Geomatica's FELEE function documentation
Ci = moving_stddev(Ic, filtsize) / moving_average(Ic, filtsize) #1st array in memory
Cu = numpy.sqrt(1 / nlooks) #scalar
Cmax = numpy.sqrt(1 + (2 * nlooks)) #scalar
W = numpy.exp(-dfactor * (Ci - Cu) / (Cmax - Ci)) #2nd array in memory
Im = moving_average(Ic, filtsize) #3rd array in memory
If = Im * W + Ic * (1 - W) #4th array in memory
W = None # Back to 3 arrays in memory
return numpy.select([Ci <= Cu, (Cu < Ci) * (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])
where nlooks
and dfactor
are scalars and Ic
is the unfiltered array.
EDIT bas开发者_如何学Goed on your suggestions (I am also looking at numexpr), my improved code for enh_lee
is as follows, but is still not enough to get past the last step without running out of memory:
def enh_lee(Ic, filtsize, nlooks, dfactor):
Im = moving_average(Ic, filtsize)
Ci = moving_stddev(Ic, filtsize)
Ci /= Im
Cu = numpy.sqrt(1 / nlooks)
Cmax = numpy.sqrt(1 + (2 * nlooks))
W = Ci
W -= Cu
W /= Cmax - Ci
W *= -dfactor
numpy.exp(W, W)
If = 1
If -= W
If *= Ic
If += Im * W
W = None
return numpy.select([Ci <= Cu, (Cu < Ci) & (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])
There are several (memory usage) optimizations you can make here... A few tricks to keep in mind are:
- Most numpy functions take an
out
parameter than can be used to specify the output array instead of returning a copy. E.g.np.sqrt(x, x)
will take the square root of an array in-place. x += 1
uses half the memory thatx = x + 1
does, as the latter makes a temporary copy. When it's possible, try to split calculations up into*=
,+=
,/=
, etc. When it's not, use numexpr, as @eumiro suggested. (Or just used numexpr regardless... It's quite handy in many cases.)
So, first off, here's your original function's performance with a 10000x10000 array of random data and a filtsize
of 3:
Memory Usage Profile of Original Function
What's interesting are the big spikes at the end. These occur during your numpy.select(...)
bit. There a plenty of places where you're inadvertently creating additional temporary arrays, but they're mostly irrelevant, as they're overwhelmed by what goes on during the call to select
.
At any rate, if we replace your original (clean and compact) code with this rather verbose version, you can significantly optimize your memory usage:
import numpy
import scipy.ndimage
def main(x=None):
if x is None:
ni, nj = 10000, 10000
x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
filtsize = 3
nlooks = 10.0
dfactor = 10.0
x = enh_lee(x, filtsize, nlooks, dfactor)
return x
def moving_average(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
return Im
def moving_stddev(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
Im *= -1
Im += Ic
Im **= 2
scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
return numpy.sqrt(Im, Im)
def enh_lee(Ic, filtsize, nlooks, dfactor):
# Implementation based on PCI Geomatica's FELEE function documentation
Ci = moving_stddev(Ic, filtsize)
Im = moving_average(Ic, filtsize)
Ci /= Im
Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32)
Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32)
W = Ci.copy()
W -= Cu
W *= -dfactor
W /= Cmax - Ci
W = numpy.exp(W, W)
If = Im * W
W *= -1
W += 1
W *= Ic
If += W
del W
# Replace the call to numpy.select
out = If
filter = Ci <= Cu
numpy.putmask(out, filter, Im)
del Im
filter = Ci >= Cmax
numpy.putmask(out, filter, Ic)
return out
if __name__ == '__main__':
main()
Here's the resulting memory profile for this code:
Memory Usage Profile of Numpy-based Optimized Version
So, we've greatly reduced memory usage, but the code is somewhat less readable (i.m.o.).
However, those last three peaks are the two numpy.where
calls...
If numpy.where
took an out
parameter, we could further reduce the peak memory usage by another ~300Mb or so. Unfortunately, it doesn't, and I don't know of a more memory-efficient way to do it...
We can use numpy.putmask
to replace the call to numpy.select
and do the operation in-place (Thanks to @eumiro for mentioning this in an entirely different question.)
If we optimize things with numexpr, we get considerably cleaner code (compared to the pure-numpy version above, not the original). You could probably whittle the memory usage down a bit in this version... I'm not terribly familiar with numexpr, beyond having used it a few times.
import numpy
import scipy.ndimage
import numexpr as ne
def main(x=None):
if x is None:
ni, nj = 10000, 10000
x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
filtsize = 3
nlooks = 10.0
dfactor = 10.0
x = enh_lee(x, filtsize, nlooks, dfactor)
return x
def moving_average(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
return Im
def moving_stddev(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
Im = ne.evaluate('((Ic-Im) ** 2)')
scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
return ne.evaluate('sqrt(Im)')
def enh_lee(Ic, filtsize, nlooks, dfactor):
# Implementation based on PCI Geomatica's FELEE function documentation
Ci = moving_stddev(Ic, filtsize)
Im = moving_average(Ic, filtsize)
Ci /= Im
Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32)
Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32)
W = ne.evaluate('exp(-dfactor * (Ci - Cu) / (Cmax - Ci))')
If = ne.evaluate('Im * W + Ic * (1 - W)')
del W
out = ne.evaluate('where(Ci <= Cu, Im, If)')
del Im
del If
out = ne.evaluate('where(Ci >= Cmax, Ic, out)')
return out
if __name__ == '__main__':
main()
And here's the memory-usage profile for the numexpr version: (Note that the execution time has been more than halved compared to the original!)
Memory Usage Profile of Numexpr-based Optimized Version*
The largest memory usage is still during the calls to where
(replacing the call to select
). However, the peak memory usage has been significantly cut. The easiest way to further reduce it would be to find some way to select
operate in-place on one of the arrays. It would be fairly easy to do this with cython (nested loops would be rather slow in pure python, and any sort of boolean indexing in numpy will create an additional copy). You may be better off by simply chunking the input array as you've been doing, though...
Just as as side note, the updated versions produce the same output as the original code. There was a typo in the original numpy-based code...
You are counting your arrays in memory, but in reality there are more of them. Have a look at numexpr, the text on the title page will explain you what happens and the package could help you solving your problem.
As an example how to reduce memory usage, let's have a look at your moving_stddev()
function. Expressions like
((Ic-Im) ** 2)
will create temporary arrays, but they can be avoided:
def moving_stddev(Ic, filtsize):
Im = numpy.empty(Ic.shape, dtype='Float32')
scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
Im -= Ic
Im **= 2
scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
numpy.sqrt(Im, Im)
return Im
Note that numpy.sqrt()
and scipy.ndimage.filters.uniform_filter()
work fine with using the same input and output array. Be careful with these constructs, though, because sometimes this has unexpected side effects.
There is a nice answer by Joe Kington detailing more ways to save memory.
精彩评论