开发者

Uploaded file char-set conversion with Ruby

I have an application where we're having our clients upload a csv file to our server. We then process and put the data from the csv into our database. We're running into some issues with char-sets especially when we're dealing wit开发者_运维百科h JSON, in particular some non-converted UTF-8 characters are breaking IE on JSON responses.

Is there a way to convert the uploaded csv file to UTF-8 before we start processing it? Is there a way to determine the character encoding of an uploaded file? I've played with iconv a bit but we're not always sure what encoding the uploaded file will have. Thanks.


This solution might be not ideal, but should do the job.

First, the ingredients:

  • chardet (sudo gem install chardet)
  • fastercsv (sudo gem install fastercsv)

Now the actual code (not tested):

require 'rubygems'
require 'UniversalDetector'
require 'fastercsv'
require 'iconv'

file_to_import = File.open("path/to/your.csv")
# determine the encoding based on the first 100 characters
chardet = UniversalDetector::chardet(file_to_import.read[0..100])
if chardet['confidence'] > 0.7
  charset = chardet['encoding']
else 
  raise 'You better check this file manually.'
end
file_to_import.each_line do |l| 
  converted_line = Iconv.conv('utf-8', charset, l)
  row = FasterCSV.parse(converted_line)[0]
  # do the business here
end
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜