Is there a solution to the character encoding problem ("�") for Rails 2 / Ruby 1.8.7?
From the Rails 3 announcement listing the major new features:
Say goodbye to encoding issues
If you browse the Internet with any frequency, you will likely encounter the � character. This problem is extremely pervasive, and is caused by mixing and matching content with different encodings.
In a system like Rails, content comes from the database, your templates, your source files, and from the user. Ruby 1.9 gives us the raw tools to eliminate these problems, and in combination with Rails 3, � should be a thing of the past in Rails applications. Never struggle with corrupted data pasted by a user from Microsoft Word again!
I have an app where users often paste in text from MS Word and we encounter exactly this issue.
However we're running Rails 2 and Ruby 1.8.7. There is no immediate prospect of changing this.
I think the encoding problem usually manifests with typographer's quotes ("curly quotes"). Probably also things like em dashes and the elipses character.
I'm wondering if there's rout开发者_开发问答ine I can run on the incoming data to overcome this problem.
It's OK if the quotes get turned into straight quotes, elipses get turned into three periods, etc.
It could even be a utility that runs on the system level that I could call from my app with
processed_data = `system_command #{params[:incoming_data]}`
You can use the rchardet gem to detect the encoding of incoming strings, and the built-in Iconv libs to convert strings that need conversion:
require ‘rchardet’
[...]
cd = CharDet.detect(params[:my_upload_form][:uploaded_file])
encoding = cd['encoding']
converted_string = Iconv.conv(‘UTF-8′, encoding, params[:my_upload_form][:uploaded_file])
The example is working on an uploaded file, but of course you can apply it to data coming in from textareas or wherever else you think users may be pasting data in encodings other than the one you want.
Borrowed shamelessly from the kind gentleman at http://www.meeho.net/blog/2010/03/ruby-how-to-detect-the-encoding-of-a-string/.
精彩评论