ruby incorrect method behavior (possible depending charset)
I got weird behavior from ruby (in irb):
irb(main):002:0> pp " LS 600"
"\302\240\302\240\302\240\302\240LS 600"
irb(main):003:0> pp " LS 600".strip开发者_Python百科
"\302\240\302\240\302\240\302\240LS 600"
That means (for those, who don't understand) that strip
method does not affect this string at all, same with gsub('/\s+/', '')
How can I strip that string (I got it while parsing Internet page)?
The string "\302\240"
is a UTF-8 encoded string (C2 A0
) for Unicode code point A0
, which represents a non breaking space character. There are many other Unicode space characters. Unfortunately the String#strip
method removes none of these.
If you use Ruby 1.9.2, then you can solve this in the following way:
# Ruby 1.9.2 only.
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".gsub(/^\p{Space}+|\p{Space}+$/, "")
In Ruby 1.8.7 support for Unicode is not as good. You might be successful if you can depend on Rails's ActiveSupport::Multibyte
. This has the advantage of getting a working strip
method for free. Install ActiveSupport with gem install activesupport
and then try this:
# Ruby 1.8.7/1.9.2.
$KCODE = "u"
require "rubygems"
require "active_support/core_ext/string/multibyte"
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".mb_chars.strip.to_s
精彩评论