Python regular expression gives unexpected result
I'm trying to create an svn pre-commit hook, but can't get my regular expression to work as expected. It 开发者_StackOverflow社区should print False
for messages that do not look like "DEV-5 | some message". Why do I get True here?
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile("^\[[A-Z]+-[0-9]+\] | .+$", re.DOTALL)
>>> message = "test message"
>>> match = p.search(message)
>>> bool(match)
True
>>> p = re.compile("^[A-Z]+-[0-9]+ \| .+$", re.DOTALL)
>>> print p.search("test message")
None
>>> print p.search("DEV-5 | some message")
<_sre.SRE_Match object at 0x800eb78b8>
- you don't need
\[
and\]
- you need to escape
|
The culprit is the trailing " | .+$" which is matching ' message' as an alternative to the first regex. As Roman pointed out you meant to match literal '|' so you have to escape it as '\|'.
To see what was being matched, you can do:
print match.group()
' message'
(By the way, a faster non-regex way to only handle lines containing vertical bar would use line.split('|'):
for line in ...:
parts = line.split('|',1)
if len(parts)==1: continue
(code,mesg) = parts
I haven't run the code, but I suspect that the part after the alternative (|
) in your regexp matches any nonempty string starting with a space, in this case it's " message"
.
精彩评论