开发者

Equivalent of Matlab 'ismember' in numpy (Python)? [duplicate]

This question already has answers here: Python equivalent of MATLAB's "ismember" function (5 answers) Closed 2 years ago.

I am struggling to find a Numpy equivalent for a particular Matlab coding "pattern" using ismember.

Unfortunately this c开发者_Python百科ode tends to be where most of the time is spent in my Matlab scripts so I want to find an efficient Numpy equivalent.

The basic pattern consists of mapping a subset onto a larger grid. I have a set of key value pairs stored as parallel arrays and I want to insert these values into a larger list of key value pairs stored in the same way.

For concreteness say I have quarterly GDP data that I map onto a monthly time grid as follows.

quarters = [200712 200803 200806 200809 200812 200903];
gdp_q = [10.1 10.5 11.1 11.8 10.9 10.3];
months = 200801 : 200812;
gdp_m = NaN(size(months));
[tf, loc] = ismember(quarters, months);
gdp_m(loc(tf)) = gdp_q(tf);

Note that not all the quarters appear in the list of months so both the tf and the loc variables are required.

I have seen similar questions on StackOverflow but they either just give a pure Python solution (here) or where numpy is used then the loc argument isn't returned (here).

In my particular application area, this particular code pattern tends to arise over and over again and uses up most of the CPU time of my functions so an efficient solution here is really crucial for me.

Comments or redesign suggestions are also welcome.


If months is sorted, use np.searchsorted. Otherwise, sort and then use np.searchsorted:

import numpy as np
quarters = np.array([200712, 200803, 200806, 200809, 200812, 200903])
months = np.arange(200801, 200813)
loc = np.searchsorted(months, quarters)

np.searchsorted returns the insertion position. If there is a possibility that your data is not even in the right range, you might want to have a check afterwards:

valid = (quarters <= months.max()) & (quarters >= months.min())
loc = loc[valid]

This is a O(N log N) solution. If this is still a big deal in your programme in terms of run time, you might just do this one subroutine in C(++) using a hashing scheme, which would be O(N) (as well as avoiding some constant factors, of course).


I think you can redesign the original MATLAB code sample you give so that it doesn't use the ISMEMBER function. This may speed up the MATLAB code and make it easier to reimplement in Python if you still want to:

quarters = [200712 200803 200806 200809 200812 200903];
gdp_q = [10.1 10.5 11.1 11.8 10.9 10.3];

monthStart = 200801;              %# Starting month value
monthEnd = 200812;                %# Ending month value
nMonths = monthEnd-monthStart+1;  %# Number of months
gdp_m = NaN(1,nMonths);           %# Initialize gdp_m

quarters = quarters-monthStart+1;  %# Shift quarter values so they can be
                                   %#   used as indices into gdp_m
index = (quarters >= 1) & (quarters <= nMonths);  %# Logical index of quarters
                                                  %#   within month range
gdp_m(quarters(index)) = gdp_q(index);  %# Move values from gdp_q to gdp_m


Try the ismember library from pypi.

pip install ismember

Example:

# Import library
from ismember import ismember

# data
quarters = np.array([200712, 200803, 200806, 200809, 200812, 200903])
months = np.arange(200801, 200812)

# Lookup
Iloc,idx=ismember(quarters, months)

# Iloc is boolean defining existence of quarters in months 
print(Iloc)
# [False,  True,  True,  True, False, False]

# index of months that exists in quarters
print(idx)
# [2, 5, 8]

print(months[idx])
[200803, 200806, 200809]

print(quarters[Iloc])
[200803, 200806, 200809]

# These vectors will match
quarters[Iloc]==months[idx]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜