How to parse a single line csv string without the csv.reader iterator in python?
I have a CSV file that i need to rearrange and renecode. I'd like to run
line = line.decode('windows-1250').encode('utf-8')
on each line before it's parsed and split by the CSV reader. Or I'd like iterate over lines myself run the re-encoding and use just single line parsing form CSV library but 开发者_运维技巧with the same reader instance.
Is there a way to do that nicely?
Loop over lines on file can be done this way:
with open('path/to/my/file.csv', 'r') as f:
for line in f:
puts line # here You can convert encoding and save lines
But if You want to convert encoding of a whole file You can also call:
$ iconv -f Windows-1250 -t UTF8 < file.csv > file.csv
Edit: So where the problem is?
with open('path/to/my/file.csv', 'r') as f:
for line in f:
line = line.decode('windows-1250').encode('utf-8')
elements = line.split(",")
Thx, for the answers. The wrapping one gave me an idea:
def reencode(file):
for line in file:
yield line.decode('windows-1250').encode('utf-8')
csv_writer = csv.writer(open(outfilepath,'w'), delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
for c in csv_reader:
l = # rearange columns here
csv_writer.writerow(l)
That's exactly what i was going for re-encoding a line just before it's get parsed by the csv_reader.
At the very bottom of the csv documentation is a set of classes (UnicodeReader and UnicodeWriter) that implements Unicode support for csv:
rfile = open('input.csv')
wfile = open('output.csv','w')
csv_reader = UnicodeReader(rfile,encoding='windows-1250')
csv_writer = UnicodeWriter(wfile,encoding='utf-8')
for c in csv_reader:
# process Unicode lines
csv_writer.writerow(c)
rfile.close()
wfile.close()
精彩评论