Grep multi-layered iterable for strings that match (Python)

2022-12-09 07:54 问答作者：

Say that we have a multilayered iterable with some strings at the "final" level, yes strings are iterable, but I think that you get my meaning:

['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch')),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
...
]

Is there any convenient way that I could implement a search that would give me the indices of the matching strings? I would like something that would act something like this (where the above list is data):

>>> grep('javascript', data)

and it would return [ (2,1,1), (2,2,0), (2,2,1), (2,2,2) ] perhaps. Maybe I'm missing a comparable solution that returns nothing of the sort but can help me find some strings within a multi-layered list of iterables of iterables of .... strings.

I wrote a little bit but it was seeming juvenile and inelegant so I thought I would ask here. I guess that I could just keep nesting the exception the way I started 开发者_开发知识库here to the number of levels that the function would then support, but I was hoping to get something neat, abstract, pythonic.

import re

def rgrep(s, data):
    ''' given a iterable of strings or an iterable of iterables of strings,

    returns the index/indices of strings that contain the search string.

    Args::

        s - the string that you are searching for
        data - the iterable of strings or iterable of iterables of strings
    '''


    results = []
    expr = re.compile(s)
    for item in data:
        try:
            match = expr.search(item)
            if match != None:
                results.append( data.index(item) )

        except TypeError:
            for t in item:
                try:
                    m = expr.search(t)
                    if m != None:
                        results.append( (list.index(item), item.index(t)) )

                except TypeError:
                    ''' you can only go 2 deep! '''
                    pass

    return results

I'd split recursive enumeration from grepping:

def enumerate_recursive(iter, base=()):
    for index, item in enumerate(iter):
        if isinstance(item, basestring):
            yield (base + (index,)), item
        else:
            for pair in enumerate_recursive(item, (base + (index,))):
                yield pair

def grep_index(filt, iter):
    return (index for index, text in iter if filt in text)

This way you can do both non-recursive and recursive grepping:

l = list(grep_index('opt1', enumerate(sys.argv)))   # non-recursive
r = list(grep_index('diff', enumerate_recursive(your_data)))  # recursive

Also note that we're using iterators here, saving RAM for longer sequences if necessary.

Even more generic solution would be to give a callable instead of string to grep_index. But that might not be necessary for you.

Here is a grep that uses recursion to search the data structure.

Note that good data structures lead the way to elegant solutions. Bad data structures make you bend over backwards to accomodate. This feels to me like one of those cases where a bad data structure is obstructing rather than helping you.

Having a simple data structure with a more uniform structure (instead of using this grep) might be worth investigating.

#!/usr/bin/env python

data=['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch',['find','java deep','down'])),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
]

def grep(astr,data,prefix=[]):
    result=[]
    for idx,elt in enumerate(data):
        if isinstance(elt,basestring):
            if astr in elt:
                result.append(tuple(prefix+[idx]))
        else:
            result.extend(grep(astr,elt,prefix+[idx]))
    return result

def pick(data,idx):
    if idx:
        return pick(data[idx[0]],idx[1:])
    else:
        return data
idxs=grep('java',data)
print(idxs)
for idx in idxs:
    print('data[%s] = %s'%(idx,pick(data,idx)))

To get the position use enumerate()

>>> data = [('foo', 'bar', 'frrr', 'baz'), ('foo/bar', 'baz/foo')]
>>> 
>>> for l1, v1 in enumerate(data):
...     for l2, v2 in enumerate(v1):
...             if 'f' in v2:
...                     print l1, l2, v2
... 
0 0 foo
1 0 foo/bar
1 1 baz/foo

In this example I am using a simple match 'foo' in bar yet you probably use regex for the job.

Obviously, enumerate() can provide support in more than 2 levels as in your edited post.

继续阅读：data-structures python regex search string

Grep multi-layered iterable for strings that match (Python)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？