Efficient Array replacement in Python

2023-04-03 09:40 问答作者：

I'm wondering what is the most efficient way to replace elements in an array with other random elements in the array given some criteria. More specifically, I need to replace each element which doesn't meet a given criteria with another random value from that row. For example, I want to replace each row of data as a random cell in data(row) which is between -.8 and .8. My inefficinet solution looks something like this:

import numpy as np
data = np.random.normal(0, 1, (10, 100))
f开发者_Python百科or index, row in enumerate(data):
        row_copy = np.copy(row)
        outliers = np.logical_or(row>.8, row<-.8)
        for prob in np.where(outliers==1)[0]:
            fixed = 0
            while fixed == 0:
                random_other_value = r.randint(0,99)
                if random_other_value in np.where(outliers==1)[0]:
                    fixed = 0
                else:
                    row_copy[prob] = row[random_other_value]
                    fixed = 1

Obviously, this is not efficient.

I think it would be faster to pull out all the good values, then use random.choice() to pick one whenever you need it. Something like this:

import numpy as np
import random
from itertools import izip

data = np.random.normal(0, 1, (10, 100))
for row in data:
    good_ones = np.logical_and(row >= -0.8, row <= 0.8)
    good = row[good_ones]
    row_copy = np.array([x if f else random.choice(good) for f, x in izip(good_ones, row)])

High-level Python code that you write is slower than the C internals of Python. If you can push work down into the C internals it is usually faster. In other words, try to let Python do the heavy lifting for you rather than writing a lot of code. It's zen... write less code to get faster code.

I added a loop to run your code 1000 times, and to run my code 1000 times, and measured how long they took to execute. According to my test, my code is ten times faster.

Additional explanation of what this code is doing:

row_copy is being set by building a new list, and then calling np.array() on the new list to convert it to a NumPy array object. The new list is being built by a list comprehension.

The new list is made according to the rule: if the number is good, keep it; else, take a random choice from among the good values.

A list comprehension walks over a sequence of values, but to apply this rule we need two values: the number, and the flag saying whether that number is good or not. The easiest and fastest way to make a list comprehension walk along two sequences at once is to use izip() to "zip" the two sequences together. izip() will yield up tuples, one at a time, where the tuple is (f, x); f in this case is the flag saying good or not, and x is the number. (Python has a built-in feature called zip() which does pretty much the same thing, but actually builds a list of tuples; izip() just makes an iterator that yields up tuple values. But you can play with zip() at a Python prompt to learn more about how it works.)

In Python we can unpack a tuple into variable names like so:

a, b = (2, 3)

In this example, we set a to 2 and b to 3. In the list comprehension we unpack the tuples from izip() into variables f and x.

Then the heart of the list comprehension is a "ternary if" statement like so:

a if flag else b

The above will return the value a if the flag value is true, and otherwise return b. The one in this list comprehension is:

x if f else random.choice(good)

This implements our rule.

继续阅读：arrays numpy python replace

Efficient Array replacement in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？