Create a new array from numpy array based on the conditions from a list
Suppose that I have an array defined by:
data = np.array([('a1v1', 'a2v1', 'a3v1', 'a4v1', 'a5v1'),
('a1v1', 'a2v1', 'a3v1', 'a4v2', 'a5v1'),
('a1v3', 'a2v1', 'a3v1', 'a4v1', 'a5v2'),
('a1v2', 'a2v2', 'a3v1', 'a4v1', 'a5v2'),
('a1v2', 'a2v3', 'a3v2', 'a4v1', 'a5v2'),
('a1v2', 'a2v3', 'a3v2', 'a4v2', 'a5v1'),
('a1v3', 'a2v3', 'a3v2', 'a4v2', 'a5v2'),
('a1v1', 'a2v2', 'a3v1', 'a4v1', 'a5v1'),
('a1v1', 'a2v3', 'a3v2', 'a4v1', 'a5v2'),
('a1v2', 'a2v2', 'a3v2', 'a4v1', 'a5v2'),
('a1v1', 'a2v2', 'a3v2', 'a4v2', 'a5v2'),
('a1v3', 'a2v2', 'a3v1', 'a4v2', 'a5v2'),
('a1v3', 'a2v1', 'a3v2', 'a4v1', 'a5v2'),
('a1v2', 'a2v2', 'a3v1', 'a4v开发者_StackOverflow中文版2', 'a5v1')],
dtype=[('a1', '|S4'), ('a2', '|S4'), ('a3', '|S4'),
('a4', '|S4'), ('a5', '|S4')])
How to create a function to list out data elements by row with conditions given in a list of tuples, r.
r = [('a1', 'a1v1'), ('a4', 'a4v1')]
I know that it can be done manually like this:
data[(data['a1']=='a1v1') & data['a4']=='a4v1']
What about removing rows from data that comply with the r.
data[(data['a1']!='a1v1') | data['a4']!='a4v1']
Thanks.
If I'm understanding you correctly, you want to list the entire row, where a given tuple of columns is equal to some value. In that case, this should be what you want, though it's a bit verbose and obscure:
test_cols = data[['a1', 'a4']]
test_vals = np.array(('a1v1', 'a4v1'), test_cols.dtype)
data[test_cols == test_vals]
Note the "nested list" style indexing... That's the easiest way to select multiple columns of a structured array. E.g.
data[['a1', 'a4']]
will yield
array([('a1v1', 'a4v1'), ('a1v1', 'a4v2'), ('a1v3', 'a4v1'),
('a1v2', 'a4v1'), ('a1v2', 'a4v1'), ('a1v2', 'a4v2'),
('a1v3', 'a4v2'), ('a1v1', 'a4v1'), ('a1v1', 'a4v1'),
('a1v2', 'a4v1'), ('a1v1', 'a4v2'), ('a1v3', 'a4v2'),
('a1v3', 'a4v1'), ('a1v2', 'a4v2')],
dtype=[('a1', '|S4'), ('a4', '|S4')])
You can then test this agains a tuple of the values that you're checking for and get a one-dimensional boolean array where those columns are equal to those values.
However, with structured arrays, the dtype has to be an exact match. E.g. data[['a1', 'a4']] == ('a1v1', 'a4v1')
just yields False
, so we have to make an array of the values we want to test using the same dtype as the columns we're testing against. Thus, we have to do something like:
test_cols = data[['a1', 'a4']]
test_vals = np.array(('a1v1', 'a4v1'), test_cols.dtype)
before we can do this:
data[test_cols == test_vals]
Which yields what we were originally after:
array([('a1v1', 'a2v1', 'a3v1', 'a4v1', 'a5v1'),
('a1v1', 'a2v2', 'a3v1', 'a4v1', 'a5v1'),
('a1v1', 'a2v3', 'a3v2', 'a4v1', 'a5v2')],
dtype=[('a1', '|S4'), ('a2', '|S4'), ('a3', '|S4'), ('a4', '|S4'), ('a5', '|S4')])
Hope that makes some sense, anyway...
精彩评论