Working with encoding in ruby
I'm making a simple sinatra based web app to display chinese text, and I know enough about encoding to know that I can potentially lose information if I don't do it properly, but I feel a bit lost in the space of encoding. It's also the first time I'm working with non-english 开发者_Go百科based text in ruby.
Are there any areas in particular that I have to be careful about within my programming stack? Also are there extra libraries I should know about to ensure I encode/decode properly?
My programming stack currently consists of:
- ruby 1.9.2
- sinatra 1.2.6
- possibly postgresql
- textmate editor (currently set to UTF8 encoding) - do I need to change my encoding here?
Ruby works pretty well with UTF8 encoding, so you shouldn't have a problems with it.
But in some cases you should use magic comment #encoding: UTF-8
at the start of your files.
You can read this http://blog.grayproductions.net/articles/understanding_m17n to understand encoding in Ruby.
The best post I've read on the ruby charset implementation was written by one of the guys behind most of the code involved:
http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html
I ran into it while looking into ICU support in ruby:
http://redmine.ruby-lang.org/issues/2034
I've bee screen scraping Chinese characters for a few months at http://sinograms.com. I'm using rails3, ruby 1.9.2, and heroku.
I found no encoding issues, however I'm only accepting unicode characters. UTF is the same thing as unicode except that it is backwards compatible with ASCII so if you stick with that you should be find.
This is the best resource I found for ruby and encoding:
http://blog.grayproductions.net/articles/ruby_19s_string
You can check if the Chinese Character is unicode with the following script:
def check(char)
char = char.unpack('U*').first
if char >= 0x4E00 && char <= 0x9FFF
return true
end
if char >= 0x3400 && char <= 0x4DBF
return true
end
if char >= 0x20000 && char <= 0x2A6DF
return true
end
if char >= 0x2A700 && char <= 0x2B73F
return true
end
return false
end
精彩评论