Replacing values greater than a limit in a numpy array
I have an array n x m, and maximum values for each column. What's the best way to replace values greater than the maximum, besides checking each element?
For example:
def check_limits(bad_array, maxs):
good_array = np.copy(bad_array)
for i_line in xrange(bad_array.shape[0]):
for i_column in xrange(bad_array.shape[1]):
if good_array[开发者_StackOverflow中文版i_line][i_column] >= maxs[i_column]:
good_array[i_line][i_column] = maxs[i_column] - 1
return good_array
Anyway to do this faster and in a more concise way?
Use putmask:
import numpy as np
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
m = np.array([7,6,5,4])
# This is what you need:
np.putmask(a, a >= m, m - 1)
# a is now:
np.array([[0, 1, 2, 3],
[4, 5, 4, 3],
[6, 5, 4, 3]])
Another way is to use the clip function:
using eumiro's example:
bad_array = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
maxs = np.array([7,6,5,4])
good_array = bad_array.clip(max=maxs-1)
OR
bad_array.clip(max=maxs-1, out=good_array)
you can also specify the lower limit, by adding the argument min=
If we aren't assuming anything about the structure of bad_array
, your code is optimal by the adversary argument. If we know that each column is sorted in ascending order, then as soon as we reach a value higher than the max then we know every following element in that column is also higher than the limit, but if we have no such assumption we simply have to check every single one.
If you decide to sort each column first, this would take (n columns * nlogn) time, which is already greater than the n*n time it takes to check each element.
You could also create the good_array
by checking and copying in one element at a time, instead of copying all of the elements from bad_array
and checking them later. This should roughly cut down the time by a factor of .5
If the number of columns isn't large, one optimization would be:
def check_limits(bad_array, maxs):
good_array = np.copy(bad_array)
for i_column in xrange(bad_array.shape[1]):
to_replace = (good_array[:,i_column] >= maxs[i_column])
good_array[to_replace, i_column] = maxs[i_column] - 1
return good_array
精彩评论