开发者

How do I search through regex matches in Python?

I need to try a string against multiple (exclusive - meaning a string that matches one of them can't match any of the other) regexes, and execute a different piece of code depending on which one it 开发者_如何学Pythonmatches. What I have currently is:

m = firstre.match(str)
if m:
    # Do something

m = secondre.match(str)
if m:
    # Do something else

m = thirdre.match(str)
if m:
    # Do something different from both

Apart from the ugliness, this code matches against all regexes even after it has matched one of them (say firstre), which is inefficient. I tried to use:

elif m = secondre.match(str)

but learnt that assignment is not allowed in if statements.

Is there an elegant way to achieve what I want?


def doit( s ):

    # with some side-effect on a
    a = [] 

    def f1( s, m ):
        a.append( 1 )
        print 'f1', a, s, m

    def f2( s, m ):
        a.append( 2 )
        print 'f2', a, s, m

    def f3( s, m ):
        a.append( 3 )
        print 'f3', a, s, m

    re1 = re.compile( 'one' )
    re2 = re.compile( 'two' )
    re3 = re.compile( 'three' )


    func_re_list = (
        ( f1, re1 ), 
        ( f2, re2 ), 
        ( f3, re3 ),
    )
    for myfunc, myre in func_re_list:
        m = myre.match( s )
        if m:
            myfunc( s, m )
            break


doit( 'one' ) 
doit( 'two' ) 
doit( 'three' ) 


This might be a bit over engineering the solution, but you could combine them as a single regexp with named groups and see which group matched. This could be encapsulated as a helper class:

import re
class MultiRe(object):
    def __init__(self, **regexps):
        self.keys = regexps.keys()
        self.union_re = re.compile("|".join("(?P<%s>%s)" % kv for kv in regexps.items()))

    def match(self, string, *args):
        result = self.union_re.match(string, *args)
        if result:
            for key in self.keys:
                if result.group(key) is not None:
                    return key

Lookup would be like this:

multi_re = MultiRe(foo='fo+', bar='ba+r', baz='ba+z')
match = multi_re.match('baaz')
if match == 'foo':
     # one thing
elif match == 'bar':
     # some other thing
elif match == 'baz':
     # or this
else:
     # no match


This is a good application for the undocumented but quite useful re.Scanner class.


A few ideas, none of them good necessarily, but it might fit your code well:

How about putting the code in a separate function, i.e. MatchRegex(), which returns which regex it matched. That way, inside the function, you can use a return after you matched the first (or second) regex, meaning you lose the inefficiency.

Of course, you could always go with just nested if statements:

m = firstre.match(str)
if m:
   # Do something
else:
    m = secondre.match(str)
    ...

I really don't see any reason not to go with nested ifs. They're very easy to understand and as efficient as you want. I'd go for them just for their simplicity.


You could use

def do_first(str, res, actions):
  for re,action in zip(res, actions):
    m = re.match(str)
    if m:
      action(str)
      return

So, for example, say you've defined

def do_something_1(str):
  print "#1: %s" % str

def do_something_2(str):
  print "#2: %s" % str

def do_something_3(str):
  print "#3: %s" % str

firstre  = re.compile("foo")
secondre = re.compile("bar")
thirdre  = re.compile("baz")

Then call it with

do_first("baz",
         [firstre,        secondre,       thirdre],
         [do_something_1, do_something_2, do_something_3])


Early returns, perhaps?

def doit(s):
    m = re1.match(s)
    if m:
        # Do something
        return

    m = re2.match(s)
    if m:
        # Do something else
        return

    ...

Ants Aasma's answer is good too. If you prefer less scaffolding you can write that out yourself using the verbose regex syntax.

re = re.compile(r'''(?x)    # set the verbose flag
    (?P<foo> fo+ )
  | (?P<bar> ba+r )
  | #...other alternatives...
''')

def doit(s):
    m = re.match(s)
    if m.group('foo'):
        # Do something
    elif m.group('bar'):
        # Do something else
    ...

I've done this a lot. It's fast and it works with re.finditer.


Do it with an elif in case you just need a True/False out of regex matching:

if regex1.match(str):
    # do stuff
elif regex2.match(str):
    # and so on
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜