开发者

python regex of a date in some text, enclosed by two keywords

This is Part 2 of this question and thanks very much for David's answer. What if I need to extract dates which are bounded by two keywords?

Example:

text = "One 09 Jun 2011 Two 10 Dec 2012 Three 15 Jan 2015 End"

Case 1 bounding keyboards: "One" and "Three"
Result expected: ['09 Jun 2011', '10 Dec 2012']

Case 2 bounding keyboards: "Two" and "End"
Result expected: ['10 Dec 2012', '15 Jan 2015']
开发者_StackOverflow社区

Thanks!


You can do this with two regular expressions. One regex gets the text between the two keywords. The other regex extracts the dates.

match = re.search(r"\bOne\b(.*?)\bThree\b", text, re.DOTALL)
if match:
    betweenwords = match.group(1)
    dates = re.findall(r'\d\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4}', betweenwords) 


Do you really need to worry about the keywords? Can you ensure that the keywords will not change?

If not, the exact same solution from the previous question can solve this:

>>> import re
>>> text = "One 09 Jun 2011 Two 10 Dec 2012 Three 15 Jan 2015 End"
>>> match = re.findall(r'\d\d\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}', text)
>>> match
['09 Jun 2011', '10 Dec 2012', '15 Jan 2015']

If you really only need two of the dates, you could just use list slicing:

>>> match[:2]
['09 Jun 2011', '10 Dec 2012']
>>> match[1:]
['10 Dec 2012', '15 Jan 2015']
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜