Length of a unicode string

2023-01-13 23:52 问答作者：

In my Rails (2.3, Ruby 1.8.7) application, I need to truncate a string to a certain length. the string is unicode, and when running tests in console, such as 'א'.length, I realized that a double length is returned. I would like an encoding-agnostic length, so that the same truncation would be done for a unicode string or a latin1 e开发者_如何学Cncoded string.

I've gone over most of the unicode material for Ruby, but am still a little in the dark. How should this problem be tackled?

Rails has an mb_chars method which returns multibyte characters. Try unicode_string.mb_chars.slice(0,50)

"ア".size # 3 in 1.8, 1 in 1.9
puts "ア".scan(/./mu).size # 1 in both 1.8 and 1.9

chars and mb_chars don't give you text elements, which is what you seem to be looking for.

For text elements you'll want the unicode gem.

mb_chars:

>> 'กุ'.mb_chars.size
=> 2

>> 'กุ'.mb_chars.first.to_s
=> "ก"

text_elements:

>> Unicode.text_elements('กุ').size
=> 1

>> Unicode.text_elements('กุ').first
=> "กุ"

You can use something like str.chars.slice(0, 50).join to get the first 50 characters of a string, no matter how many bytes it uses per character.

继续阅读：ruby ruby-on-rails unicode

Length of a unicode string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？