Using Python to replace "\r\r\n" with "\r\n" in a binary file
I'm very new to Python and just crawling my way through it to accomplish a task and would appreciate some help (Python 3.1).
I have a CSV file written with DictWriter with a dialect of "excel". After the file is created, I'm notice extra lines in the file, and upon closer inspection it's because I have "\r\r\n" at the end of each line instead of "\r\n".
I could solve this one of 2 ways:
Open the file in binary mode instead of text. Problem with this is that I cannot for the life of me figure out how to get writerow() to work against a binary file -- I get a ton of exceptions.
Second (easier) solution is just replacing all the "\r\r\n" with "\r\n".
However, on my attempts, I ran into these errors:
a. Not closing the file first, and the search and replace just adds even more "\r\r\n" lines. b. I've tried closing the file first, to re-open in binary mode and doing the same search and replace but I"m getting and error:
WindowsError: [Error 32] The process canno开发者_Python百科t access the file because it is being used by another process
Here is the code:
#code before this writes to the final in text mode
myfile.close()
myfile = open(outputFile, "wb")
for line in fileinput.FileInput(outputFile, inplace=1):
line = line.replace("\r\r\n", "\r\n")
print (line)
myfile.close()
Would appreicate any help anyone can provide!
The safe way to alter a file (with the exception of appending, which can be safely done in-place) is to copy it with modification to a new file, remove the old one, rename the new like the old. This is the one solid way to avoid catastrophic errors and data loss. Depending on the platform, the step to "remove old, rename new" can be atomic, but that's hard in Windows and not all that crucial.
So I'd simply do that -- in one big gulp, unless the file is horribly huge (gigabyte-plus):
with open(filename, 'rb') as f:
data = f.read()
with open(newfilename, 'wb') as f:
f.write(data.replace('\r\r\n', '\r\n'))
os.unlink(filename)
os.rename(newfilename, filename)
The problems with your code are of confusion between binary and text mode -- you can't properly "read a line" from a binary-mode opened file, for example.
Edit in Python 3.1 we need to deal with bytes
instances here, not strings, since the files are binary ones. So, per the docs, the write
calls must become
f.write(data.replace(b'\r\r\n', b'\r\n'))
those b
prefixes tell Python we're dealing with bytes
, not str
ings.
Also, the problem you are having with \r\r\n could be caused by you being on the Windows platform and by opening the file in text mode, rather than in binary mode.
I was having this problem, and found the answer here Python 2 CSV writer produces wrong line terminator on Windows
Try this:
fileR = open(outputFile, "r")
text = fileR.read().replace("\r\r\n", "\r\n")
fileR.close()
fileW = open(outputFile, "wb")
fileW.write(text)
fileW.close()
I'm not too well versed with special cases in file handling. However, since you mentioned that you are dealing with a CSV file (which can be opened with excel), I would recommend taking a peek into pyExcelerator.
Hope this helps
To correctly write the CSV files instead of correcting them after the fact, see this question: Python3: writing csv files
精彩评论