Unescaping characters in a string with Ruby
Given a string in the following format (the Posterous API returns posts in this format):
s="\\u003Cp\\u003E"
How can I convert it to the actual ascii characters such that s="<p>"
?
On OSX, I successfully used Iconv.iconv('ascii', 'java', s)
but once deployed to Heroku, I receive an Iconv::IllegalSequence
exception. I'm guessing that the system Heroku deploys to does开发者_如何学C't support the java
encoder.
I am using HTTParty to make a request to the Posterous API. If I use curl to make the same request then I do not get the double slashes.
From HTTParty github page:
Automatic parsing of JSON and XML into ruby hashes based on response content-type
The Posterous API returns JSON (no double slashes) and HTTParty's JSON parsing is inserting the double slash.
Here is a simple example of the way I am using HTTParty to make the request.
class Posterous
include HTTParty
base_uri "http://www.posterous.com/api/2"
basic_auth "username", "password"
format :json
def get_posts
response = Posterous.get("/users/me/sites/9876/posts&api_token=1234")
# snip, see below...
end
end
With the obvious information (username, password, site_id, api_token) replaced with valid values.
At the point of snip, response.body
contains a Ruby string that is in JSON format and response.parsed_response
contains a Ruby hash object which HTTParty created by parsing the JSON response from the Posterous API.
In both cases the unicode sequences such as \u003C
have been changed to \\u003C
.
I've found a solution to this problem. I ran across this gist. elskwid had the identical problem and ran the string through a JSON parser:
s = ::JSON.parse("\\u003Cp\\u003E")
Now, s = "<p>"
.
I ran into this exact problem the other day. There is a bug in the json parser that HTTParty uses (Crack gem) - basically it uses a case-sensitive regexp for the Unicode sequences, so because Posterous puts out A-F instead of a-f, Crack isn't unescaping them. I submitted a pull request to fix this.
In the meantime HTTParty nicely lets you specify alternate parsers so you can do ::JSON.parse
bypassing Crack entirely like this:
class JsonParser < HTTParty::Parser
def json
::JSON.parse(body)
end
end
class Posterous
include HTTParty
parser ::JsonParser
#....
end
You can also use pack
:
"a\\u00e4\\u3042".gsub(/\\u(....)/){[$1.hex].pack("U")} # "aäあ"
Or to do the reverse:
"aäあ".gsub(/[^ -~\n]/){"\\u%04x"%$&.ord} # "a\\u00e4\\u3042"
The doubled-backslashes almost look like a regular string being viewed in a debugger.
The string "\u003Cp\u003E"
really is "<p>"
, only the \u003C
is unicode for <
and \003E
is >
.
>> "\u003Cp\u003E" #=> "<p>"
If you are truly getting the string with doubled backslashes then you could try stripping one of the pair.
As a test, see how long the string is:
>> "\\u003Cp\\u003E".size #=> 13
>> "\u003Cp\u003E".size #=> 3
>> "<p>".size #=> 3
All the above was done using Ruby 1.9.2, which is Unicode aware. v1.8.7 wasn't. Here's what I get using 1.8.7's IRB for comparison:
>> "\u003Cp\u003E" #=> "u003Cpu003E"
精彩评论