removing pairs of elements from numpy arrays that are NaN (or another value) in Python

2022-12-27 17:30 问答作者：

I have an array with two columns in numpy. For example:

a = array([[1, 5, nan, 6],
           [10, 6, 6, nan]])
a = transpose(a)

I want to efficiently iterate through the two columns, a[:, 0] and a[:, 1] and remove any pairs that meet a certain condition, in this case if they are NaN. The obvious way I can think of is:

new_a = []
for val1, val2 in a:
  if val2 == nan or val2 == nan:
    new_a.append([val1, val2])

But that seems clunky. What's the pythonic numpy way of doing 开发者_StackOverflow社区this?

thanks.

If you want to take only the rows that have no NANs, this is the expression you need:

>>> import numpy as np
>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

If you want the rows that do not have a specific number among its elements, e.g. 5:

>>> a[~(a == 5).any(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

The latter is clearly equivalent to

>>> a[(a != 5).all(1)]
array([[  1.,  10.],
       [ NaN,   6.],
       [  6.,  NaN]])

Explanation: Let's first create your example input

>>> import numpy as np
>>> a = np.array([[1, 5, np.nan, 6],
...               [10, 6, 6, np.nan]]).transpose()
>>> a
array([[  1.,  10.],
       [  5.,   6.],
       [ NaN,   6.],
       [  6.,  NaN]])

This determines which elements are NAN

>>> np.isnan(a)
array([[False, False],
       [False, False],
       [ True, False],
       [False,  True]], dtype=bool)

This identifies which rows have any element which are True

>>> np.isnan(a).any(1)
array([False, False,  True,  True], dtype=bool)

Since we don't want these, we negate the last expression:

>>> ~np.isnan(a).any(1)
array([ True,  True, False, False], dtype=bool)

And finally we use the boolean array to select the rows we want:

>>> a[~np.isnan(a).any(1)]
array([[  1.,  10.],
       [  5.,   6.]])

You could convert the array into a masked array, and use the compress_rows method:

import numpy as np
a = np.array([[1, 5, np.nan, 6],
           [10, 6, 6, np.nan]])
a = np.transpose(a)
print(a)
# [[  1.  10.]
#  [  5.   6.]
#  [ NaN   6.]
#  [  6.  NaN]]
b=np.ma.compress_rows(np.ma.fix_invalid(a))
print(b)
# [[  1.  10.]
#  [  5.   6.]]

Not to detract from ig0774's answer, which is perfectly valid and Pythonic and is in fact the normal way of doing these things in plain Python, but: numpy supports a boolean indexing system which could also do the job.

new_a = a[(a==a).all(1)]

I'm not sure offhand which way would be more efficient (or faster to execute).

If you wanted to use a different condition to select the rows, this would have to be changed, and precisely how depends on the condition. If it's something that can be evaluated for each array element independently, you could just replace the a==a with the appropriate test, for example to eliminate all rows with numbers larger than 100 you could do

new_a = a[(a<=100).all(1)]

But if you're trying to do something fancy that involves all the elements in a row (like eliminating all rows that sum to more than 100), it might be more complicated. If that's the case, I can try to edit in a more specific answer if you want to share your exact condition.

I think list comprehensions should do this. E.g.,

new_a = [(val1, val2) for (val1, val2) in a if math.isnan(val1) or math.isnan(val2)]

继续阅读：arrays numpy python scipy

removing pairs of elements from numpy arrays that are NaN (or another value) in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？