catching optional part in regular expression

2023-02-04 00:23 问答作者：

I have an input text which can be either:

"URL: 开发者_如何转开发http://www.cnn.com Cookie: xxx; yyy"

or just:

"URL: http://www.cnn.com"

How do I capture both URL and cookie into two separate variables in Python? The part I don't know how specify is the optional cookie.

Thanks.

str = 'URL: http://www.cnn.com Cookie: xxx; yyy'

match = re.search(r'URL: (\S+)( Cookie: (.*))?', str)
print match.group(1)
print match.group(3)

>>> http://www.cnn.com
>>> xxx; yyy

import re

inputstring = "URL: http://www.cnn.com Cookie: xxx; yyy"

if 'Cookie' in inputstring:
    m = re.match('URL: (.*?) Cookie: (.*)', inputstring)
    if m:
        url = m.group(1)
        cookie = m.group(2)
        print url
        print cookie
else:
    m = re.match('URL: (.*)', inputstring)
    if m:
        url = m.group(0)
        print url

Just use separate capture groups, and ? for the optional part of your regex. If a capture group doesn't capture anything the group's value will be None.

>>> regex = re.compile(r'URL: (\S+)(?:\s+Cookie: (\S+))?')
>>> regex.match("URL: http://www.cnn.com Cookie: xxx;yyy").groups()
('http://www.cnn.com', 'xxx;yyy')
>>> regex.match("URL: http://www.cnn.com").groups()
('http://www.cnn.com', None)

I've just used \S+ for the URL and cookie patterns in the above for example purposes. Replace them with your real URL and cookie patterns.

Instead of groups() you can use group(1) and group(2) -- the behavior is the same, but groups() is nice with unpacking. eg:

url, cookie = match.groups()

Enclose the optional part in (Cookie: xxx; yyy")?

继续阅读：python regex

catching optional part in regular expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？