开发者

Find comma space year but ignore comma year without space

I am trying to read in a file and every time , year is found it prints it out. For example if it finds , 2003 it will print that out, but if it finds ,2003 it will ignore it. I originally used a split and was able to get the year开发者_开发问答 to match up, but when I added the , I realized that it looked at it like two different words so I dont think that would work.

Here is my code:

import string
import re

while True:
    filename=raw_input('Enter a file name: ')
    if filename == 'exit':
        break
    try:
        file = open(filename, 'r') 
        text=file.read() 
        file.close() 
    except:
        print('file does not exist')
    else:
        p=re.compile('^\,\s(19|20)\d\d$')//this is my regular expression
        print(text)
        m=p.search(text)
        if m:
                print(m.groups())


  1. If you want to search the file for the regex rather than match the entire file contents, remove ^ and $ from the regex.

  2. If you want more than one match per file, use finditer or findall instead of search.

  3. Use raw string when specifying the regex: p=re.compile(r',\s(19|20)\d\d')

Example:

for m in re.finditer(r',\s((19|20)\d\d)', text):
    print m.group(1)


>>> import re
>>> text = "foo bar, 2003, 2006,1923, derp"
>>> p = re.compile(r',\s((?:19|20)\d\d)')
>>> p.findall(text)
['2003', '2006']

Simplified example. First of all, remove the anchors (^ and $) and use findall instead of search to find all matches. I also used ?: to designate a non-matching group (it won't show up in the results) and made the year a group instead.


If you just add a * to the \s in your regex, I think it should work. This will make it match zero or more whitespace characters, instead of exactly one. If you only want it to match zero or one, add a + instead.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜