Unescaping characters in a string with Ruby

2023-01-24 17:53 问答作者：

Given a string in the following format (the Posterous API returns posts in this format):

s="\\u003Cp\\u003E"

How can I convert it to the actual ascii characters such that s="<p>"?

On OSX, I successfully used Iconv.iconv('ascii', 'java', s) but once deployed to Heroku, I receive an Iconv::IllegalSequence exception. I'm guessing that the system Heroku deploys to does开发者_如何学C't support the java encoder.

I am using HTTParty to make a request to the Posterous API. If I use curl to make the same request then I do not get the double slashes.

From HTTParty github page:

Automatic parsing of JSON and XML into ruby hashes based on response content-type

The Posterous API returns JSON (no double slashes) and HTTParty's JSON parsing is inserting the double slash.

Here is a simple example of the way I am using HTTParty to make the request.

class Posterous
  include HTTParty
  base_uri "http://www.posterous.com/api/2"
  basic_auth "username", "password"
  format :json
  def get_posts
    response = Posterous.get("/users/me/sites/9876/posts&api_token=1234")
    # snip, see below...
  end
end

With the obvious information (username, password, site_id, api_token) replaced with valid values.

At the point of snip, response.body contains a Ruby string that is in JSON format and response.parsed_response contains a Ruby hash object which HTTParty created by parsing the JSON response from the Posterous API.

In both cases the unicode sequences such as \u003C have been changed to \\u003C.

I've found a solution to this problem. I ran across this gist. elskwid had the identical problem and ran the string through a JSON parser:

s = ::JSON.parse("\\u003Cp\\u003E")

Now, s = "<p>".

I ran into this exact problem the other day. There is a bug in the json parser that HTTParty uses (Crack gem) - basically it uses a case-sensitive regexp for the Unicode sequences, so because Posterous puts out A-F instead of a-f, Crack isn't unescaping them. I submitted a pull request to fix this.

In the meantime HTTParty nicely lets you specify alternate parsers so you can do ::JSON.parse bypassing Crack entirely like this:

class JsonParser < HTTParty::Parser
  def json
    ::JSON.parse(body)
  end
end

class Posterous
   include HTTParty
   parser ::JsonParser

   #....
end

You can also use pack:

"a\\u00e4\\u3042".gsub(/\\u(....)/){[$1.hex].pack("U")} # "aäあ"

Or to do the reverse:

"aäあ".gsub(/[^ -~\n]/){"\\u%04x"%$&.ord} # "a\\u00e4\\u3042"

The doubled-backslashes almost look like a regular string being viewed in a debugger.

The string "\u003Cp\u003E" really is "<p>", only the \u003C is unicode for < and \003E is >.

>> "\u003Cp\u003E"  #=> "<p>"

If you are truly getting the string with doubled backslashes then you could try stripping one of the pair.

As a test, see how long the string is:

>> "\\u003Cp\\u003E".size #=> 13
>> "\u003Cp\u003E".size #=> 3
>> "<p>".size #=> 3

All the above was done using Ruby 1.9.2, which is Unicode aware. v1.8.7 wasn't. Here's what I get using 1.8.7's IRB for comparison:

>> "\u003Cp\u003E" #=> "u003Cpu003E"

继续阅读：escaping httparty json posterous ruby

Unescaping characters in a string with Ruby

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？