How do I search through regex matches in Python?
I need to try a string against multiple (exclusive - meaning a string that matches one of them can't match any of the other) regexes, and execute a different piece of code depending on which one it 开发者_如何学Pythonmatches. What I have currently is:
m = firstre.match(str)
if m:
# Do something
m = secondre.match(str)
if m:
# Do something else
m = thirdre.match(str)
if m:
# Do something different from both
Apart from the ugliness, this code matches against all regexes even after it has matched one of them (say firstre), which is inefficient. I tried to use:
elif m = secondre.match(str)
but learnt that assignment is not allowed in if statements.
Is there an elegant way to achieve what I want?
def doit( s ):
# with some side-effect on a
a = []
def f1( s, m ):
a.append( 1 )
print 'f1', a, s, m
def f2( s, m ):
a.append( 2 )
print 'f2', a, s, m
def f3( s, m ):
a.append( 3 )
print 'f3', a, s, m
re1 = re.compile( 'one' )
re2 = re.compile( 'two' )
re3 = re.compile( 'three' )
func_re_list = (
( f1, re1 ),
( f2, re2 ),
( f3, re3 ),
)
for myfunc, myre in func_re_list:
m = myre.match( s )
if m:
myfunc( s, m )
break
doit( 'one' )
doit( 'two' )
doit( 'three' )
This might be a bit over engineering the solution, but you could combine them as a single regexp with named groups and see which group matched. This could be encapsulated as a helper class:
import re
class MultiRe(object):
def __init__(self, **regexps):
self.keys = regexps.keys()
self.union_re = re.compile("|".join("(?P<%s>%s)" % kv for kv in regexps.items()))
def match(self, string, *args):
result = self.union_re.match(string, *args)
if result:
for key in self.keys:
if result.group(key) is not None:
return key
Lookup would be like this:
multi_re = MultiRe(foo='fo+', bar='ba+r', baz='ba+z')
match = multi_re.match('baaz')
if match == 'foo':
# one thing
elif match == 'bar':
# some other thing
elif match == 'baz':
# or this
else:
# no match
This is a good application for the undocumented but quite useful re.Scanner class.
A few ideas, none of them good necessarily, but it might fit your code well:
How about putting the code in a separate function, i.e. MatchRegex()
, which returns which regex it matched. That way, inside the function, you can use a return after you matched the first (or second) regex, meaning you lose the inefficiency.
Of course, you could always go with just nested if
statements:
m = firstre.match(str)
if m:
# Do something
else:
m = secondre.match(str)
...
I really don't see any reason not to go with nested if
s. They're very easy to understand and as efficient as you want. I'd go for them just for their simplicity.
You could use
def do_first(str, res, actions):
for re,action in zip(res, actions):
m = re.match(str)
if m:
action(str)
return
So, for example, say you've defined
def do_something_1(str):
print "#1: %s" % str
def do_something_2(str):
print "#2: %s" % str
def do_something_3(str):
print "#3: %s" % str
firstre = re.compile("foo")
secondre = re.compile("bar")
thirdre = re.compile("baz")
Then call it with
do_first("baz",
[firstre, secondre, thirdre],
[do_something_1, do_something_2, do_something_3])
Early returns, perhaps?
def doit(s):
m = re1.match(s)
if m:
# Do something
return
m = re2.match(s)
if m:
# Do something else
return
...
Ants Aasma's answer is good too. If you prefer less scaffolding you can write that out yourself using the verbose regex syntax.
re = re.compile(r'''(?x) # set the verbose flag
(?P<foo> fo+ )
| (?P<bar> ba+r )
| #...other alternatives...
''')
def doit(s):
m = re.match(s)
if m.group('foo'):
# Do something
elif m.group('bar'):
# Do something else
...
I've done this a lot. It's fast and it works with re.finditer
.
Do it with an elif in case you just need a True/False out of regex matching:
if regex1.match(str):
# do stuff
elif regex2.match(str):
# and so on
精彩评论