开发者

Parsing inside a directory problem Python 2.7 vs. 3.2

I am trying to do some basic file parsing within a directory in Python 3. This code works perfectly in Python 2.7, but I can not figure out what the problem is in Python 3.2.

import sys, os, re

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
os.chdir('/Users/sbrown/Desktop/Test') 
for file in filelist:
    infile = open(file, mode='r') 
    filestring = infile.read() 
    infile.close() 
    pattern = re.compile('exit') 
    filestring = pattern.sub('so long', filestring) 
    outfile = open(file, mode='w') 
    outfile.write(filestring)
    outfile.close 
exit

This is the error that is thrown back:

Traceback (most recent call last):
  File "/Users/bunsen/Desktop/parser.py", line 9, in <module>
      filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 31开发者_C百科31: ordinal not in range(128)`

The files I am parsing are all text files. I tried specifying the encoding in the method arguments to utf-8, but that didn't work. Any ideas? Thanks in advance!

If I specify the encoding as utf-8, here is the error that is thrown:

Traceback (most recent call last):
  File "/Users/sbrown/Desktop/parser.py", line 9, in <module>
    filestring = infile.read()
  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 3131: ordinal not in range(128)`


You are not specifying an encoding when you open your files. You need to do that in Python 3, as in Python 3 a text mode file will return decoded Unicode strings.

Now you tried with UTF-8, and that didn't work, so obviously, that isn't the encoding used. Only you know what encoding it is, but I'm guessing it's cp1252, as 0x80 is that code page's character for €, so failing on 0x80 is common when you have European Windows users. :-)

To be compatible with Python 2.7 and 3.1 I recommend you use the io library to open files. That is the one used in Python 3 by default, and it's available in Python 2.6 and later as well:

import io
infile = io.open(filelist[0], mode='rt', encoding='cp1252')


Does this work?

import codecs
infile = codecs.open(filelist[0], encoding='UTF-8')
infile.read()


Test

filelist = os.listdir('/Users/sbrown/Desktop/Test') 
infile = open(filelist[0], mode='r') 
print(infile.encoding)

to be sure that you read your files in utf-8. If not, check if you haven't done something wicked with codecs. Also could you post the trace for your test with forced utf-8?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜