开发者

How to get a Ruby substring of a Unicode string?

I have a field in my Rails model that has max length 255.

I'm importing data into it, and some times the imported data has a length > 255. I'm willing to simply chop it off so that I end up with the largest possible valid string that fits.

I originally tried to do field[0,255] in order to get this, but this will actually chop trailing Unicode right through a character. When I then go to save this into the database, it throws an error telling me I have an invalid character due to the character that's been halved or quartered.

What's the recommended way to chop off Unic开发者_开发问答ode characters to get them to fit in my space, without chopping up individual characters?


Uh. Seems like truncate and friends like to play with chars, but not their little cousins bytes. Here's a quick answer for your problem, but I don't know if there's a more straighforward and elegant question I mean answer

def truncate_bytes(string, size)
  count = 0
  string.chars.take_while{|c| (a += c.bytes.to_a.length) <= size }.join
end

Give a look at the Chars class of ActiveSupport.


Use the multibyte proxy method (mb_chars) before manipulating the string:

str.mb_chars[0,255]

See http://api.rubyonrails.org/classes/String.html#method-i-mb_chars.

Note that until Rails 2.1 the method was "chars".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜