Find comma space year but ignore comma year without space
I am trying to read in a file and every time , year
is found it prints it out. For example if it finds , 2003
it will print that out, but if it finds ,2003
it will ignore it. I originally used a split and was able to get the year开发者_开发问答 to match up, but when I added the ,
I realized that it looked at it like two different words so I dont think that would work.
Here is my code:
import string
import re
while True:
filename=raw_input('Enter a file name: ')
if filename == 'exit':
break
try:
file = open(filename, 'r')
text=file.read()
file.close()
except:
print('file does not exist')
else:
p=re.compile('^\,\s(19|20)\d\d$')//this is my regular expression
print(text)
m=p.search(text)
if m:
print(m.groups())
If you want to search the file for the regex rather than match the entire file contents, remove
^
and$
from the regex.If you want more than one match per file, use
finditer
orfindall
instead ofsearch
.Use raw string when specifying the regex:
p=re.compile(r',\s(19|20)\d\d')
Example:
for m in re.finditer(r',\s((19|20)\d\d)', text):
print m.group(1)
>>> import re
>>> text = "foo bar, 2003, 2006,1923, derp"
>>> p = re.compile(r',\s((?:19|20)\d\d)')
>>> p.findall(text)
['2003', '2006']
Simplified example. First of all, remove the anchors (^
and $
) and use findall
instead of search
to find all matches. I also used ?:
to designate a non-matching group (it won't show up in the results) and made the year a group instead.
If you just add a *
to the \s
in your regex, I think it should work. This will make it match zero or more whitespace characters, instead of exactly one. If you only want it to match zero or one, add a +
instead.
精彩评论