Interpreting non-latin characters in Sinatra coming from Mac Excel 2011
I've a Mac VBA script making a request to a Ruby Sinatra web app.
The text passing from Excel contains characters such as é. Ruby (version 1.9.2) chokes on these characters as Excel is not sending them as UTF-8.
# encoding: utf-8
require 'rubygems'
require 'sinatra'
require "sinatra/reloader" if development?
configure d开发者_如何学Pythono
class << Sinatra::Base
def options(path, opts={}, &block)
route 'OPTIONS', path, opts, &block
end
end
Sinatra::Delegator.delegate :options
end
options '/' do
response.headers["Access-Control-Allow-Origin"] = "*"
response.headers["Access-Control-Allow-Methods"] = "POST"
halt 200
end
post '/fetch' do
chars = []
params['excel_input'].valid_encoding? #returns false
params['excel_input']
end
My Excel VBA:
Sub FetchAddress()
For Each oDest In Selection
With ActiveSheet.QueryTables.Add(Connection:="URL;http://localhost:4567/fetch", Destination:=oDest)
.PostText = "excel_input=" & oDest.Offset(0, -1).Value
.RefreshStyle = xlOverwriteCells
.SaveData = True
.Refresh
End With
Next
End Sub
The character é comes out the other end as Ž.
It looks like the text in Excel is encoded as Windows-1252 http://en.wikipedia.org/wiki/Windows-1252.
The byte representation of the character is 142 (or Ž in Windows-1252).
iconv can convert the input to UTF-8. It converts the character encoding from one encoding to another. So something like this should work:
require "iconv"
...
post '/fetch' do
excel_input = Iconv.conv("UTF-8", "WINDOWS-1252", params['excel_input'])
...
end
you can also probably look at: https://github.com/jmhodges/rchardet then, you can autodetect charset and then convert it to utf-8.
Ruby 1.9 Encodings: A Primer and the Solution for Rails - yehuda katz is a good read. If you have some time. Goes in to depth about encodings and how to convert between them.
精彩评论