开发者

Reading CSV File - invalid byte sequence in UTF-8

I have been using a rake file for a number of months to read in data from a CSV file. I have recently tried to read in a new CSV file but keep getting the error "invalid byte sequence in UTF-8". I have tried to manually work out where the problem is, but with little success. The csv file is just text and URLs, there were a few unusual characters initially (where the original text had fancy bulletpoints) but I have removed those and cannot find any additional anomalies.

Is there a way to get round开发者_高级运维 this problem automatically and identify and remove the problem characters?


I've found a solution to discard all invalid utf8 bytes from a string :

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = ic.iconv(untrusted_string + ' ')[0..-2]

(taken from this blog post)

Hope this helps.


Where abouts do you put these. I have something like this:

CSV.foreach("/Users/CarlBourne/Customers/Lloyds/small-test2.csv", options) do |row |

    name, workgroup, address, actual, output = row
    next if nbname == "NBName"
    @ssl_info[name] = workgroup, address, actual, output

    ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
    clean = ic.iconv(output + ' ')[0..-2]

puts clean

end

However it doesn't seam to work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜