开发者

Is there a way to use regexp match for characters with tilde?

Look at this:

"nAo".match(/(nao)/i) # => #<MatchData "nAo" 1:"nAo">

"nÃo".match(/(não)/i) # => nil

is there a way to fix that?

Edit: It seems that ruby lacks suppor开发者_运维技巧t for unicode characters on regexp comparisons with i flag(ignore case)... Using MRI 1.8.7p249


Don't know about Ruby but most regex engine don't understand uppercase/lowercase for non ASCII characters. The best you can do is:

/(n[ãÃ]o)/

The problem with understanding uppercase/lowercase relationship is that it is language dependent. Unicode encodes only the form of the character, not the meaning. Therefore an uppercase character in unicode can have different lowercase characters depending on the language.

Take for example SS. In English the lowercase would be ss but in German it can be ß. Another example is the letter I which in English has the lowercase i but in Turkish its lowercase is ı (without a dot). That's because i in Turkish has the uppercase İ (with a dot).

Due to this, most regex implementations simply give up and refuse to understand uppercase/lowercase relationships for characters outside standard ASCII.


Try to find some unicode normalization modules for Ruby.


Note that Ruby has a better character support since 1.9 (it seems like you run Ruby 1.8.7). The old regex engine was replaced with Oniguruma in Ruby 1.9.

http://www.geocities.jp/kosako3/oniguruma/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜