catching optional part in regular expression
I have an input text which can be either:
"URL: 开发者_如何转开发http://www.cnn.com Cookie: xxx; yyy"
or just:
"URL: http://www.cnn.com"
How do I capture both URL and cookie into two separate variables in Python? The part I don't know how specify is the optional cookie.
Thanks.
str = 'URL: http://www.cnn.com Cookie: xxx; yyy'
match = re.search(r'URL: (\S+)( Cookie: (.*))?', str)
print match.group(1)
print match.group(3)
>>> http://www.cnn.com
>>> xxx; yyy
import re
inputstring = "URL: http://www.cnn.com Cookie: xxx; yyy"
if 'Cookie' in inputstring:
m = re.match('URL: (.*?) Cookie: (.*)', inputstring)
if m:
url = m.group(1)
cookie = m.group(2)
print url
print cookie
else:
m = re.match('URL: (.*)', inputstring)
if m:
url = m.group(0)
print url
Just use separate capture groups, and ?
for the optional part of your regex. If a capture group doesn't capture anything the group's value will be None
.
>>> regex = re.compile(r'URL: (\S+)(?:\s+Cookie: (\S+))?')
>>> regex.match("URL: http://www.cnn.com Cookie: xxx;yyy").groups()
('http://www.cnn.com', 'xxx;yyy')
>>> regex.match("URL: http://www.cnn.com").groups()
('http://www.cnn.com', None)
I've just used \S+
for the URL and cookie patterns in the above for example purposes. Replace them with your real URL and cookie patterns.
Instead of groups()
you can use group(1)
and group(2)
-- the behavior is the same, but groups()
is nice with unpacking. eg:
url, cookie = match.groups()
Enclose the optional part in (Cookie: xxx; yyy")?
精彩评论