How to parse this request params in Rails?
I get params like开发者_如何学Go s = "%u041D%u0430%u0434%u043E%u0435%u043B" with incoming request to my web server.
How to decode this to normal UTF8 string in Rails ? thank you!
It looks like the non-standard format produced by escape
in JavaScript. If you can influence the code that is sending this data you should probably try to arrange for it to use encodeURI
instead (which yields “normal” percent encoding of UTF-8 encoded characters).
# Unescape percent encoding.
#
# The normal byte-oriented format ("%41") and the non-standard <em>%u</em>
# format ("%u0410") are both supported. The single-byte variant is decoded
# as if it represents bytes encoded with the same encoding as +str+. The
# two-byte <em>%u</em> variant is decoded as UTF-16BE and then re-encoded
# with the same encoding as +str+; surrogate pairs are supported.
#
# Since the resulting string will have the same encoding as +str+, all byte
# sequences resulting from the byte-oriented decoding must be valid sequences
# in the the encoding of +str+. Correspondingly, the encoding of +str+ must
# be compatible with any extended characters that are decoded from the
# UTF-16BE <em>%u</em> encodings.
def unescape(str)
hh = /[0-9a-f]{2}/i
hhhh = /[0-9a-f]{4}/i
str.gsub(/((?:%#{hh})+)|((?:%u#{hhhh})+)/) do
if $1
$1.scan(hh).map(&:hex).pack('C*').force_encoding(str.encoding)
elsif $2
$2.scan(hhhh).map(&:hex).pack('S*').force_encoding(Encoding::UTF_16BE).
encode!(str.encoding)
else
raise 'unhandled match'
end
end
end
def all_same?(e)
first = e.first
e.drop(1).all? { |o| o.eql?(first) }
end
ss = [
# %-encoded-UTF-16BE -> SJIS (just for something fun... UTF-8 works fine)
'%u041D%u0430%u0434%u043E%u0435%u043B'.encode!(Encoding::SJIS),
# %-encoded-ISO-8859-5 -> ISO-8859-5
'%bd%d0%d4%de%d5%db'.encode!(Encoding::ISO8859_5),
# %-encoded-UTF-8 -> UTF-8
'%d0%9d%d0%b0%d0%b4%d0%be%d0%b5%d0%bb'.encode!(Encoding::UTF_8),
]
ss2 = [ # demonstrate non-decoded content and UTF-16BE surrogate pair decoding
# %-encoded-UTF-16BE -> UTF-8
'A%uD801%uDC10%u0410'.encode!(Encoding::UTF_8),
# %-encoded-UTF-8 -> UTF-8
'%41%f0%90%90%90%D0%90'.encode!(Encoding::UTF_8),
]
ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss.map { |s| s.encode(Encoding::UTF_8) }
ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }
When run through irb:
ruby-1.9.2-head > ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:Shift_JIS>, #<Encoding:ISO-8859-5>, #<Encoding:UTF-8>]
=> ["\x{844E}\x{8470}\x{8474}\x{8480}\x{8475}\x{847C}", "\xBD\xD0\xD4\xDE\xD5\xDB", "Надоел"]
ruby-1.9.2-head > all_same? ss.map { |s| s.encode(Encoding::UTF_8) }
=> true
ruby-1.9.2-head >
ruby-1.9.2-head > ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]
=> ["A
精彩评论