python regex - last occurance before EOL
I have a set of strings from a log file that I need to parse:
timestamp - user not found : user1
timestamp - exception in xyz.security.plugin: global error : low memory
I want to capture the text between "-" and the last ":".
Currently I am using r' -(.*?)\n' which captures the string till the EOL. Please bear in mind that there may be more than 2 colons used in the string. I need to capture till the very last colon used before EOL. Also, if there are no ":" colons in the string, it should take EOL as the ending sequence.
thanks.
EDIT: better examples;
2011-07-29 07:29:44,112 [TP-Processor10] ERROR springsecurity.GrailsDaoImpl - User not found: sspm
2011-07-29 09:01:05,850 [TP-Processor3] ERROR transaction.JDBCTransaction - JDBC commit failed
开发者_运维百科2011-07-29 08:32:00,353 [TP-Processor1] ERROR errors.GrailsExceptionResolver - Exception occurred when processing request: [POST] /webapp/user/index - parameters: runtime exception
import re
for line in open('logfile.log'):
match = re.search(r'-(.*):', line)
if match:
print match.group(1)
else:
match = re.search(r'-(.*)', line)
if match:
print match.group(1)
else:
print 'No match in line', line.strip()
Try this:
"(?<=-).*(?=:[^:]*$)"
It matches between a - and the last : in the current line. If there is no colon, it won't match at all, therefore you can do:
r = re.compile("(?<=-).*(?=:[^:]*$)")
result = r.search(mystring)
if result:
match = result.group(0)
else:
match = "\n"
This does what you said ("if there is no colon, match EOL"), if you meant "if there is no colon, match until EOL", then a single regex will do:
r = re.compile("(?<=-)(?:[^:]*$|.*(?=:[^:]*$))")
r'^.+ -(.+):.*$' does the trick for me.
This works because the (.+) is greedy. Check the Python documentation for re here - in particular, for *, +, and ?.
加载中,请稍侯......
精彩评论