Python Conditional Regular Expression
This is a question involving a conditional regular expression in python:
I'd like to match the string "abc"
with
match(1)="a"
match(2)="b"
match(3)="c"
but also match the string " a"
with
match(1)="a"
match(2)=""
match(3)=""
The following code ALMOST does this, the problem is that in the first case match(1)="a"
but in the second case, match(4)="a"
(not match(1)
as desired).
In fact, if you iterate through all the groups with for g in re.search(myre,teststring2).groups():
, you get 6 groups (not 3 as was expected).
import re
import sys
teststring1 = "abc"
teststring2 = " a"
myre = '^(?=(\w)(\w)(\w))|(?=\s{2}(\w)()())'
if re.search(myre,teststring1):
print re.search(myre,teststring1).group(1)
if re.search(myre,teststring2):
prin开发者_JS百科t re.search(myre,teststring2).group(1)
Any thoughts? (note this is for Python 2.5)
Maybe...:
import re
import sys
teststring1 = "abc"
teststring2 = " a"
myre = '^\s{0,2}(\w)(\w?)(\w?)$'
if re.search(myre,teststring1):
print re.search(myre,teststring1).group(1)
if re.search(myre,teststring2):
print re.search(myre,teststring2).group(1)
This does give a
in both cases as you wish, but maybe it would not match the way you want in other cases you're not showing (e.g. with no spaces in front, or spaces and more than one letter afterwards, so that the total length of the matched string is != 3
... but I'm just guessing that you don't want matches in such cases...?)
Each capturing group in the expression gets it's own index. Try this:
r = re.compile("^\s*(\w)(\w)?(\w)?$")
abc -> ('a', 'b', 'c')
a -> ('a', None, None)
To break it down:
^ // anchored at the beginning
\s* // Any number of spaces to start with
(\w) // capture the first letter, which is required
(\w)? // capture the second letter, which is optional
(\w)? // capture the third letter, which is optional
$ // anchored at the end
myre = '^(?=\s{0,2}(\w)(?:(\w)(\w))?)'
This will handle the two cases you describe in the fashion you want, but is not necessarily a general solution. It feels like you've come up with a toy problem that represents a real one.
A general solution is very hard to come by because the processing of later elements depends on the processing of previous ones and/or the reverse. For example, the initial spaces shouldn't be there if you have the full abc
. And if the initial spaces are there, you should only find a
.
In my opinion, the best way to handle this is with the |
construct you had originally. You can have some code after the match that pulls the groups out into an array and arranges them more to your liking.
The rule for groups is that all open parenthesis that are not immediately followed by ?:
become a group. That group may be empty as it didn't actually match anything, but it will be there.
精彩评论