convert utf-8 to unicode in ruby

2023-02-10 08:03 问答作者：

The UTF-8 of "龅" is E9BE85 a开发者_C百科nd the unicode is U+9F85. Following code did not work as expected:

irb(main):004:0> "龅"
=> "\351\276\205"
irb(main):005:0> Iconv.iconv("unicode","utf-8","龅").to_s
=> "\377\376\205\237"

P.S: I am using Ruby1.8.7.

Ruby 1.9+ is much better equipped to deal with Unicode than 1.8.7, so, I strongly suggest running under 1.9.2 if at all possible.

Part of the problem is that 1.8 didn't understand that a UTF-8 or Unicode character could be more than one byte long. 1.9 does understand that and introduces things like String#each_char.

require 'iconv'

# encoding: UTF-8

RUBY_VERSION # => "1.9.2"
"龅".encoding # => #<Encoding:UTF-8>
"龅".each_char.entries # => ["龅"]
Iconv.iconv("unicode","utf-8","龅").to_s # => 

# ~> -:8:in `iconv': invalid encoding ("unicode", "utf-8") (Iconv::InvalidEncoding)
# ~>    from -:8:in `<main>'

To get the list of available encodings with Iconv, do:

require 'iconv'
puts Iconv.list

It's a long list so I won't add it here.

You can try this:

"%04x" % "龅".unpack("U*")[0]

=> "9f85"

Should use UNICODEBIG// as the target encoding

irb(main):014:0> Iconv.iconv("UNICODEBIG//","utf-8","龅")[0].each_byte {|b| puts b.to_s(16)}
9f
85
=> "\237\205"

继续阅读：iconv ruby unicode utf-8

convert utf-8 to unicode in ruby

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？