Grep multi-layered iterable for strings that match (Python)
Say that we have a multilayered iterable with some strings at the "final" level, yes strings are iterable, but I think that you get my meaning:
['something',
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch')),
('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),
('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
...
]
Is there any convenient way that I could implement a search that would give me the indices of the matching strings? I would like something that would act something like this (where the above list is data
):
>>> grep('javascript', data)
and it would return [ (2,1,1), (2,2,0), (2,2,1), (2,2,2) ] perhaps. Maybe I'm missing a comparable solution that returns nothing of the sort but can help me find some strings within a multi-layered list of iterables of iterables of .... strings.
I wrote a little bit but it was seeming juvenile and inelegant so I thought I would ask here. I guess that I could just keep nesting the exception the way I started 开发者_开发知识库here to the number of levels that the function would then support, but I was hoping to get something neat, abstract, pythonic.
import re
def rgrep(s, data):
''' given a iterable of strings or an iterable of iterables of strings,
returns the index/indices of strings that contain the search string.
Args::
s - the string that you are searching for
data - the iterable of strings or iterable of iterables of strings
'''
results = []
expr = re.compile(s)
for item in data:
try:
match = expr.search(item)
if match != None:
results.append( data.index(item) )
except TypeError:
for t in item:
try:
m = expr.search(t)
if m != None:
results.append( (list.index(item), item.index(t)) )
except TypeError:
''' you can only go 2 deep! '''
pass
return results
I'd split recursive enumeration from grepping:
def enumerate_recursive(iter, base=()):
for index, item in enumerate(iter):
if isinstance(item, basestring):
yield (base + (index,)), item
else:
for pair in enumerate_recursive(item, (base + (index,))):
yield pair
def grep_index(filt, iter):
return (index for index, text in iter if filt in text)
This way you can do both non-recursive and recursive grepping:
l = list(grep_index('opt1', enumerate(sys.argv))) # non-recursive
r = list(grep_index('diff', enumerate_recursive(your_data))) # recursive
Also note that we're using iterators here, saving RAM for longer sequences if necessary.
Even more generic solution would be to give a callable instead of string to grep_index. But that might not be necessary for you.
Here is a grep that uses recursion to search the data structure.
Note that good data structures lead the way to elegant solutions. Bad data structures make you bend over backwards to accomodate. This feels to me like one of those cases where a bad data structure is obstructing rather than helping you.
Having a simple data structure with a more uniform structure (instead of using this grep) might be worth investigating.
#!/usr/bin/env python
data=['something',
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch',['find','java deep','down'])),
('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),
('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
]
def grep(astr,data,prefix=[]):
result=[]
for idx,elt in enumerate(data):
if isinstance(elt,basestring):
if astr in elt:
result.append(tuple(prefix+[idx]))
else:
result.extend(grep(astr,elt,prefix+[idx]))
return result
def pick(data,idx):
if idx:
return pick(data[idx[0]],idx[1:])
else:
return data
idxs=grep('java',data)
print(idxs)
for idx in idxs:
print('data[%s] = %s'%(idx,pick(data,idx)))
To get the position use enumerate()
>>> data = [('foo', 'bar', 'frrr', 'baz'), ('foo/bar', 'baz/foo')]
>>>
>>> for l1, v1 in enumerate(data):
... for l2, v2 in enumerate(v1):
... if 'f' in v2:
... print l1, l2, v2
...
0 0 foo
1 0 foo/bar
1 1 baz/foo
In this example I am using a simple match 'foo' in bar
yet you probably use regex for the job.
Obviously, enumerate()
can provide support in more than 2 levels as in your edited post.
精彩评论