开发者

catching optional part in regular expression

I have an input text which can be either:

"URL: 开发者_如何转开发http://www.cnn.com Cookie: xxx; yyy"

or just:

"URL: http://www.cnn.com"

How do I capture both URL and cookie into two separate variables in Python? The part I don't know how specify is the optional cookie.

Thanks.


str = 'URL: http://www.cnn.com Cookie: xxx; yyy'

match = re.search(r'URL: (\S+)( Cookie: (.*))?', str)
print match.group(1)
print match.group(3)

>>> http://www.cnn.com
>>> xxx; yyy


import re

inputstring = "URL: http://www.cnn.com Cookie: xxx; yyy"

if 'Cookie' in inputstring:
    m = re.match('URL: (.*?) Cookie: (.*)', inputstring)
    if m:
        url = m.group(1)
        cookie = m.group(2)
        print url
        print cookie
else:
    m = re.match('URL: (.*)', inputstring)
    if m:
        url = m.group(0)
        print url


Just use separate capture groups, and ? for the optional part of your regex. If a capture group doesn't capture anything the group's value will be None.

>>> regex = re.compile(r'URL: (\S+)(?:\s+Cookie: (\S+))?')
>>> regex.match("URL: http://www.cnn.com Cookie: xxx;yyy").groups()
('http://www.cnn.com', 'xxx;yyy')
>>> regex.match("URL: http://www.cnn.com").groups()
('http://www.cnn.com', None)

I've just used \S+ for the URL and cookie patterns in the above for example purposes. Replace them with your real URL and cookie patterns.

Instead of groups() you can use group(1) and group(2) -- the behavior is the same, but groups() is nice with unpacking. eg:

url, cookie = match.groups()


Enclose the optional part in (Cookie: xxx; yyy")?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜