Detecting non-ASCII characters in Rails

2023-03-31 02:28 问答作者：

I am wondering if there's a way to detect non-ASCII characters in Rails.

I have read that Rails does not use Unicode by default, and characters like Chinese and Japanese have assigned ranges in Unicode. Is there an easy way to detect these characters in Rails? or just specify the range of characters I am expectin开发者_如何学Cg?

Is there a plugin for this? Thanks in advance!

All ideographic language encodings use multiple bytes to represent a character, and Ruby 1.9+ is aware of the difference between bytes and characters (Ruby 1.8 isn't)

You may compare the character length to the byte length of the string as a quick and dirty detector. It is probably not foolproof though.

class String
  def multibyte?
    chars.count < bytes.count
  end
end

"可口可樂".multibyte? #=> true
"qwerty".multibyte? #=> false

This is pretty easy with 1.9.2 as regular expressions are character-based in 1.9.2 and 1.9.2 knows the difference between bytes and characters top to bottom. You're in Rails so you should get everything in UTF-8. Happily, UTF-8 and ASCII overlap for the entire ASCII range so you can just remove everything that isn't between ' ' and '~' when you have UTF-8 encoded text:

>> "Wheré is µ~pancakes ho元use?".gsub(/[^ -~]/, '')
=> "Wher is ~pancakes house?"

There's really no reason to go to all this trouble though. Ruby 1.9 works great with Unicode as does Rails and pretty much everything else. Dealing with non-ASCII text was a nightmare 15 years ago, now it is common and fairly straight forward.

If you do manage to get text data that isn't UTF-8 then you have some options. If the encoding is ASCII-8BIT or BINARY then you can probably get away with s.force_encoding('utf-8'). If you end up with something other than UTF-8 and ASCII-8BIT then you can use Iconv to re-encode it.

References:

Encoding
Iconv
String#force_encoding

继续阅读：character-encoding ruby-on-rails unicode

Detecting non-ASCII characters in Rails

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？