Efficient Array replacement in Python
I'm wondering what is the most efficient way to replace elements in an array with other random elements in the array given some criteria. More specifically, I need to replace each element which doesn't meet a given criteria with another random value from that row. For example, I want to replace each row of data as a random cell in data(row) which is between -.8 and .8. My inefficinet solution looks something like this:
import numpy as np
data = np.random.normal(0, 1, (10, 100))
f开发者_Python百科or index, row in enumerate(data):
row_copy = np.copy(row)
outliers = np.logical_or(row>.8, row<-.8)
for prob in np.where(outliers==1)[0]:
fixed = 0
while fixed == 0:
random_other_value = r.randint(0,99)
if random_other_value in np.where(outliers==1)[0]:
fixed = 0
else:
row_copy[prob] = row[random_other_value]
fixed = 1
Obviously, this is not efficient.
I think it would be faster to pull out all the good values, then use random.choice()
to pick one whenever you need it. Something like this:
import numpy as np
import random
from itertools import izip
data = np.random.normal(0, 1, (10, 100))
for row in data:
good_ones = np.logical_and(row >= -0.8, row <= 0.8)
good = row[good_ones]
row_copy = np.array([x if f else random.choice(good) for f, x in izip(good_ones, row)])
High-level Python code that you write is slower than the C internals of Python. If you can push work down into the C internals it is usually faster. In other words, try to let Python do the heavy lifting for you rather than writing a lot of code. It's zen... write less code to get faster code.
I added a loop to run your code 1000 times, and to run my code 1000 times, and measured how long they took to execute. According to my test, my code is ten times faster.
Additional explanation of what this code is doing:
row_copy
is being set by building a new list, and then calling np.array()
on the new list to convert it to a NumPy array object. The new list is being built by a list comprehension.
The new list is made according to the rule: if the number is good, keep it; else, take a random choice from among the good values.
A list comprehension walks over a sequence of values, but to apply this rule we need two values: the number, and the flag saying whether that number is good or not. The easiest and fastest way to make a list comprehension walk along two sequences at once is to use izip()
to "zip" the two sequences together. izip()
will yield up tuples, one at a time, where the tuple is (f, x)
; f
in this case is the flag saying good or not, and x
is the number. (Python has a built-in feature called zip()
which does pretty much the same thing, but actually builds a list of tuples; izip()
just makes an iterator that yields up tuple values. But you can play with zip()
at a Python prompt to learn more about how it works.)
In Python we can unpack a tuple into variable names like so:
a, b = (2, 3)
In this example, we set a
to 2 and b
to 3. In the list comprehension we unpack the tuples from izip()
into variables f
and x
.
Then the heart of the list comprehension is a "ternary if" statement like so:
a if flag else b
The above will return the value a
if the flag
value is true, and otherwise return b
. The one in this list comprehension is:
x if f else random.choice(good)
This implements our rule.
精彩评论