How to normalize a NumPy array to within a certain range?
After doing some processing on an audio or image array, it needs to be normalized within a r开发者_运维百科ange before it can be written back to a file. This can be done like so:
# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()
# Normalize image to between 0 and 255
image = image/(image.max()/255.0)
Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize()
doesn't seem to be related.
# Normalize audio channels to between -1.0 and +1.0
audio /= np.max(np.abs(audio),axis=0)
# Normalize image to between 0 and 255
image *= (255.0/image.max())
Using /=
and *=
allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so
image *= 255.0/image.max() # Uses 1 division and image.size multiplications
is marginally faster than
image /= image.max()/255.0 # Uses 1+image.size divisions
Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.
In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio
and image
arrays need to have floating-point point dtype before the in-place operations are performed.
If they are not already of floating-point dtype, you'll need to convert them using astype
. For example,
image = image.astype('float64')
If the array contains both positive and negative data, I'd go with:
import numpy as np
a = np.random.rand(3,2)
# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)
# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)
# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1
If the array contains nan
, one solution could be to just remove them as:
def nan_ptp(a):
return np.ptp(a[np.isfinite(a)])
b = (a - np.nanmin(a))/nan_ptp(a)
However, depending on the context you might want to treat nan
differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.
Finally, worth mentioning even if it's not OP's question, standardization:
e = (a - np.mean(a)) / np.std(a)
You can also rescale using sklearn
. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.
from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )
The keyword arguments axis
, with_mean
, with_std
are self explanatory, and are shown in their default state. The argument copy
performs the operation in-place if it is set to False
. Documentation here.
You are trying to min-max scale the values of audio
between -1 and +1 and image
between 0 and 255.
Using sklearn.preprocessing.minmax_scale
, should easily solve your problem.
e.g.:
audio_scaled = minmax_scale(audio, feature_range=(-1,1))
and
shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)
note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.
You can use the "i" (as in idiv, imul..) version, and it doesn't look half bad:
image /= (image.max()/255.0)
For the other case you can write a function to normalize an n-dimensional array by colums:
def normalize_columns(arr):
rows, cols = arr.shape
for col in xrange(cols):
arr[:,col] /= abs(arr[:,col]).max()
This answer to a similar question solved the problem for me with
np.interp(a, (a.min(), a.max()), (-1, +1))
A simple solution is using the scalers offered by the sklearn.preprocessing library.
scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)
The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()
I tried following this, and got the error
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
The numpy
array I was trying to normalize was an integer
array. It seems they deprecated type casting in versions > 1.10
, and you have to use numpy.true_divide()
to resolve that.
arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)
img
was an PIL.Image
object.
精彩评论