开发者

NumPy Array Indexing

Simple question here about indexing an array开发者_开发知识库 to get a subset of its values. Say I have a recarray which holds ages in one space, and corresponding values in another. I also have an array which is my desired subset of ages. Here is what I mean:

ages = np.arange(100)
values = np.random.uniform(low=0, high= 1, size = ages.shape)
data = np.core.rec.fromarrays([ages, values], names='ages,values')
desired_ages = np.array([1,4, 16, 29, 80])

What I'm trying to do is something like this:

data.values[data.ages==desired_ages]

But, it's not working.


You want to create an subarray containing only the values whose indexes are in desired_ages.

Python doesn't have any syntax that directly corresponds to this, but list comprehensions can do a pretty nice job:

result = [value for index, value in enumerate(data.values) if index in desired_ages]

However, doing it this way results in Python scanning through desired_ages for each element in data.values, which is slow. If you could insert

desired_ages = set(desired_ages)

on the line before, this would improve performance. (You can determine if a value in is a set in constant time, regardless of the set's size.)


Complete Example

import numpy as np

ages = np.arange(100)
values = np.random.uniform(low=0, high= 1, size = ages.shape)
data = np.core.rec.fromarrays([ages, values], names='ages,values')
desired_ages = np.array([1,4, 16, 29, 80])

result = [value for index, value in enumerate(data.values) if index in desired_ages]
print result
Output
[0.45852624094611272, 0.0099713014816563694, 0.26695859251958864, 0.10143425810157047, 0.93647796171383935]


I changed your example a little, shuffle the order of ages:

import numpy as np
np.random.seed(0)
ages = np.arange(3,103)
np.random.shuffle(ages)
values = np.random.uniform(low=0, high= 1, size = ages.shape)
data = np.core.rec.fromarrays([ages, values], names='ages,values')
desired_ages = np.array([4, 16, 29, 80])

If all the elements of desired_ages are in data.ages, you can sort data by age field first, and then use searchsorted() to find all the index quickly:

data.sort(order="ages") # sort by ages
print data.values[np.searchsorted(data.ages, desired_ages)]

or you can use np.in1d the get a bool array and use it as index:

print data.values[np.in1d(data.ages, desired_ages)]


This is a reasonable first approach:

>>> bool_indices = reduce(numpy.logical_or, 
                          (data.ages == x for x in desired_ages))
>>> data.values[bool_indices]
array([ 0.63143784,  0.93852927,  0.0026815 ,  0.66263594,  0.2603184 ])

But that uses python functions, so it's probably slower. We can translate it pretty easily into pure numpy, using ix_ to make the arrays broadcast against each other nicely. (meshgrid with swapped arguments would work too, but would use more memory.):

>>> bools_2d = numpy.equal(*numpy.ix_(desired_ages, data.ages))
>>> bool_indices = numpy.logical_or.reduce(bools_2d)
>>> data.ages[bool_indices]
array([ 1,  4, 16, 29, 80])
>>> data.values[bool_indices]
array([ 0.32324063,  0.65453647,  0.9300062 ,  0.34534668,  0.12151951])

See also HYRY's answer for a potentially faster solution (using searchsorted) and a potentially more readable solution (using in1d).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜