开发者

Python regular expression gives unexpected result

I'm trying to create an svn pre-commit hook, but can't get my regular expression to work as expected. It 开发者_StackOverflow社区should print False for messages that do not look like "DEV-5 | some message". Why do I get True here?

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile("^\[[A-Z]+-[0-9]+\] | .+$", re.DOTALL)
>>> message = "test message"
>>> match = p.search(message)
>>> bool(match)
True


>>> p = re.compile("^[A-Z]+-[0-9]+ \| .+$", re.DOTALL)
>>> print p.search("test message")
None
>>> print p.search("DEV-5 | some message")
<_sre.SRE_Match object at 0x800eb78b8>
  • you don't need \[ and \]
  • you need to escape |


The culprit is the trailing " | .+$" which is matching ' message' as an alternative to the first regex. As Roman pointed out you meant to match literal '|' so you have to escape it as '\|'.

To see what was being matched, you can do:

print match.group()
' message'

(By the way, a faster non-regex way to only handle lines containing vertical bar would use line.split('|'):

for line in ...:
   parts = line.split('|',1)
   if len(parts)==1: continue
   (code,mesg) = parts


I haven't run the code, but I suspect that the part after the alternative (|) in your regexp matches any nonempty string starting with a space, in this case it's " message".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜