开发者

How to parse this request params in Rails?

I get params like开发者_如何学Go s = "%u041D%u0430%u0434%u043E%u0435%u043B" with incoming request to my web server.

How to decode this to normal UTF8 string in Rails ? thank you!


It looks like the non-standard format produced by escape in JavaScript. If you can influence the code that is sending this data you should probably try to arrange for it to use encodeURI instead (which yields “normal” percent encoding of UTF-8 encoded characters).

# Unescape percent encoding.
#
# The normal byte-oriented format ("%41") and the non-standard <em>%u</em>
# format ("%u0410") are both supported. The single-byte variant is decoded
# as if it represents bytes encoded with the same encoding as +str+. The
# two-byte <em>%u</em> variant is decoded as UTF-16BE and then re-encoded
# with the same encoding as +str+; surrogate pairs are supported.
#
# Since the resulting string will have the same encoding as +str+, all byte
# sequences resulting from the byte-oriented decoding must be valid sequences
# in the the encoding of +str+. Correspondingly, the encoding of +str+ must
# be compatible with any extended characters that are decoded from the
# UTF-16BE <em>%u</em> encodings.

def unescape(str)
  hh = /[0-9a-f]{2}/i
  hhhh = /[0-9a-f]{4}/i
  str.gsub(/((?:%#{hh})+)|((?:%u#{hhhh})+)/) do
    if $1
      $1.scan(hh).map(&:hex).pack('C*').force_encoding(str.encoding)
    elsif $2
      $2.scan(hhhh).map(&:hex).pack('S*').force_encoding(Encoding::UTF_16BE).
        encode!(str.encoding)
    else
      raise 'unhandled match'
    end
  end
end


def all_same?(e)
  first = e.first
  e.drop(1).all? { |o| o.eql?(first) }
end

ss = [
  # %-encoded-UTF-16BE -> SJIS (just for something fun... UTF-8 works fine)
  '%u041D%u0430%u0434%u043E%u0435%u043B'.encode!(Encoding::SJIS),
  # %-encoded-ISO-8859-5 -> ISO-8859-5
  '%bd%d0%d4%de%d5%db'.encode!(Encoding::ISO8859_5),
  # %-encoded-UTF-8 -> UTF-8
  '%d0%9d%d0%b0%d0%b4%d0%be%d0%b5%d0%bb'.encode!(Encoding::UTF_8),
]

ss2 = [ # demonstrate non-decoded content and UTF-16BE surrogate pair decoding
  # %-encoded-UTF-16BE -> UTF-8
  'A%uD801%uDC10%u0410'.encode!(Encoding::UTF_8),
  # %-encoded-UTF-8 -> UTF-8
  '%41%f0%90%90%90%D0%90'.encode!(Encoding::UTF_8),
]

ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss.map { |s| s.encode(Encoding::UTF_8) }

ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }

When run through irb:

ruby-1.9.2-head >   ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:Shift_JIS>, #<Encoding:ISO-8859-5>, #<Encoding:UTF-8>]
 => ["\x{844E}\x{8470}\x{8474}\x{8480}\x{8475}\x{847C}", "\xBD\xD0\xD4\xDE\xD5\xDB", "Надоел"] 
ruby-1.9.2-head > all_same? ss.map { |s| s.encode(Encoding::UTF_8) }
 => true 
ruby-1.9.2-head > 
ruby-1.9.2-head >   ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]
 => ["A
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜