Efficiently reading a csv file with windows newline on linux in Python
The following is working under windows for reading csv files line by line.
f = open(filename, 'r')
for line in f:
Though when copying the csv file to a linux server, it fails.
It should be mentioned that performance is an issue as the csv files are h开发者_运维问答uge. I am therefore concerned about the string copying when using things like strip.
Python has builtin support for Windows, Linux and Mac line endings:
f = open(filename, 'rtU')
for line in f:
...
If you really want don't want slow string operations, you should strip the files before processing them. You can either use dos2unix (can be found in the Debian package "tofrodos") or (easier) use FTP text mode which should do that automatically.
If performance is important, why are you not using csv.reader
?
Ummm .... You have csv files, you are using Python, why not read the files using the Python csv module?
The dos2unix utility will do this very efficiently. If the files are that large I would run that command as part of the copy.
Actually, the most efficient way to read any file is in one big I/O. There isn't always enough RAM to do that, but the less I/Os the better.
精彩评论