Numpy Lookup (Map, or Point)
I have a large numpy array:
array([[32, 32, 99, 9, 45], # A
[99, 45, 9, 45, 32],
[45, 45, 99, 99, 32],
[ 9, 9, 32, 45, 99]])
and a large-开发者_开发技巧ish array of unique values in a particular order:
array([ 99, 32, 45, 9]) # B
How can I quickly (no python dictionaries, no copies of A
, no python loops) replace the values in A
so that become the indicies of the values in B
?:
array([[1, 1, 0, 3, 2],
[0, 2, 3, 2, 1],
[2, 2, 0, 0, 1],
[3, 3, 1, 2, 0]])
I feel reaaly dumb for not being able to do this off the top of my head, nor find it in the documentation. Easy points!
Here you go
A = array([[32, 32, 99, 9, 45], # A
[99, 45, 9, 45, 32],
[45, 45, 99, 99, 32],
[ 9, 9, 32, 45, 99]])
B = array([ 99, 32, 45, 9])
ii = np.argsort(B)
C = np.digitize(A.reshape(-1,),np.sort(B)) - 1
Originally I suggested:
D = np.choose(C,ii).reshape(A.shape)
But I realized that that had limitations when you went to larger arrays. Instead, borrowing from @unutbu's clever reply:
D = np.argsort(B)[C].reshape(A.shape)
Or the one-liner
np.argsort(B)[np.digitize(A.reshape(-1,),np.sort(B)) - 1].reshape(A.shape)
Which I found to be faster or slower than @unutbu's code depending on the size of the arrays under consideration and the number of unique values.
import numpy as np
A=np.array([[32, 32, 99, 9, 45],
[99, 45, 9, 45, 32],
[45, 45, 99, 99, 32],
[ 9, 9, 32, 45, 99]])
B=np.array([ 99, 32, 45, 9])
cutoffs=np.sort(B)
print(cutoffs)
# [ 9 32 45 99]
index=cutoffs.searchsorted(A)
print(index)
# [[1 1 3 0 2]
# [3 2 0 2 1]
# [2 2 3 3 1]
# [0 0 1 2 3]]
index
holds the indices into the array cutoff associated with each element of A
. Note we had to sort B
since np.searchsorted
expects a sorted array.
index
is almost the desired answer, except that we want to map
1-->1
3-->0
0-->3
2-->2
np.argsort
provides us with this mapping:
print(np.argsort(B))
# [3 1 2 0]
print(np.argsort(B)[1])
# 1
print(np.argsort(B)[3])
# 0
print(np.argsort(B)[0])
# 3
print(np.argsort(B)[2])
# 2
print(np.argsort(B)[index])
# [[1 1 0 3 2]
# [0 2 3 2 1]
# [2 2 0 0 1]
# [3 3 1 2 0]]
So, as a one-liner, the answer is:
np.argsort(B)[np.sort(B).searchsorted(A)]
Calling both np.sort(B)
and np.argsort(B)
is inefficient since both operations amount to sorting B
. For any 1D-array B
,
np.sort(B) == B[np.argsort(B)]
So we can compute the desired result a bit faster using
key=np.argsort(B)
result=key[B[key].searchsorted(A)]
精彩评论