Python regex catch two kind of comment
Exemple :
a = "bzzzzzz <!-- blabla --> blibli * bloblo * blublu"
I want to catch the first comment. A comment may be
(<!-- .* -->) or (\* .* \*)
That is ok :
re.search("<!--(?P<comment> .* )-->",a).group(1)
Also 开发者_开发知识库that :
re.search("\*(?P<comment> .* )\*",a).group(1)
But if i want one or the other in comment, i have tried something like :
re.search("(<!--(?P<comment> .* )-->|\*(?P<comment> .* )\*)",a).group(1)
But it does't work
Thanks
Try conditional expression:
>>> for m in re.finditer(r"(?:(<!--)|(\*))(?P<comment> .*? )(?(1)-->)(?(2)\*)", a):
... print m.group('comment')
...
blabla
bloblo
the exception you get in the "doesn't work" part is quite explicit about what is wrong:
sre_constants.error: redefinition of group name 'comment' as group 3; was group 2
both groups have the same name: just rename the second one
>>> re.search("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a).group(1)
'<!-- blabla -->'
>>> re.search("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a).groups()
('<!-- blabla -->', ' blabla ', None)
>>> re.findall("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a)
[('<!-- blabla -->', ' blabla ', ''), ('* bloblo *', '', ' bloblo ')]
As Gurney pointed out, you have two captures with the same name. Since you're not actually using the name, just leave that out.
Also, the r""
raw string notation is a good habit.
Oh, and a third thing: you're grabbing the wrong index. 0
is the whole match, 1
is the whole "either-or" block, and 2
will be the inner capture that was successful.
re.search(r"(<!--( .* )-->|\*( .* )\*)",a).group(2)
re.findall
might be a better fit for this:
import re
# Keep your regex simple. You'll thank yourself a year from now. Note that
# this doesn't include the surround spaces. It also uses non-greedy matching
# so that you can embed multiple comments on the same line, and it doesn't
# break on strings like '<!-- first comment --> fragment -->'.
pattern = re.compile(r"(?:<!-- (.*?) -->|\* (.*?) \*)")
inputstring = 'bzzzzzz <!-- blabla --> blibli * bloblo * blublu foo ' \
'<!-- another comment --> goes here'
# Now use re.findall to search the string. Each match will return a tuple
# with two elements: one for each of the groups in the regex above. Pick the
# non-blank one. This works even when both groups are empty; you just get an
# empty string.
results = [first or second for first, second in pattern.findall(inputstring)]
You could go 1 of 2 ways (if supported by Python) -
1: Branch reset (?|pattern|pattern|...)
(?|<!--( .*? )-->|\*( .*? )\*)/
capture group 1 always contains the comment text
2: Conditional expression (?(condition)yes-pattern|no-pattern)
(?:(<!--)|\*)(?P<comment> .*? )(?(1)-->|\*)
here the condition is did we capt grp1
Modifiers sg
single line and global
精彩评论