How to parse a string of fullwidth integer characters to an integer in ruby
How can I parse a string of fullwidth unicode integer characters to an integer in ruby?
Attempting the obvious results in;
irb(main):011:0> a = "\uff11"
=> "1"
irb(main):012:0> Integer(a)
ArgumentError: invalid value for Integer: "\xEF\xBC\x91"
from (irb):12:in `Integer'
from (irb):12
from /export/home/henry/apps/bin/irb:12:in `<main>开发者_运维技巧'
irb(main):013:0> a.to_i
=> 0
The equivalent in python gives;
>>> a = u"\uff11"
>>> print a
1
>>> int(a)
1
Ruby 1.9's numeric parsing is thinking in ascii only. I don't think there's any convenient elegant parsing methods that properly handle fullwidth unicode numeric codepoints.
A quick filthy hack function:
def parse_utf(utf_integer_string)
ascii_numeric_chars = "0123456789"
utf_numeric_chars = "\uff10\uff11\uff12\uff13\uff14\uff15\uff16\uff17\uff18\uff19"
utf_integer_string.tr(utf_numeric_chars, ascii_numeric_chars).to_i
end
Pass in a string of fullwidth numeric characters and get out an integer.
Convert ‘compatibility’ characters like the fullwidths to their normalized versions (plain ASCII numbers in this case) before parsing as integer. For example, using Unicode::normalize_KC
or UnicodeUtils::nfkc
.
精彩评论