Reading CSV File - invalid byte sequence in UTF-8
I have been using a rake file for a number of months to read in data from a CSV file. I have recently tried to read in a new CSV file but keep getting the error "invalid byte sequence in UTF-8". I have tried to manually work out where the problem is, but with little success. The csv file is just text and URLs, there were a few unusual characters initially (where the original text had fancy bulletpoints) but I have removed those and cannot find any additional anomalies.
Is there a way to get round开发者_高级运维 this problem automatically and identify and remove the problem characters?
I've found a solution to discard all invalid utf8 bytes from a string :
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]
(taken from this blog post)
Hope this helps.
Where abouts do you put these. I have something like this:
CSV.foreach("/Users/CarlBourne/Customers/Lloyds/small-test2.csv", options) do |row |
name, workgroup, address, actual, output = row
next if nbname == "NBName"
@ssl_info[name] = workgroup, address, actual, output
ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
clean = ic.iconv(output + ' ')[0..-2]
puts clean
end
However it doesn't seam to work.
精彩评论