How do I handle this response from YQL
In a request to YQL (select * from html where url="...")
I got the following response:
callback({
"query":
{"count":"1","created":"2011-05-09T23:29:05Z","lang":"en-US"
}, "results": ["<body>... we\ufffdll call Mr ...&开发者_开发问答lt;/body>"]
}
This is from the YQL console page. When I type that sequence into firebug (even on YQL's page) I get:
... we�ll call Mr ...
What am I doing wrong? Is YQL's site in a bad encoding? Is there some way to convert symbols like this to their ascii equivalent?
BTW this isn't my site so it's not like I can change the meta charset on that site
It seems like that (the question mark in a solid black diamond) is what you should be seeing: http://www.fileformat.info/info/unicode/char/fffd/browsertest.htm
The comment on that character's page says:
used to replace an incoming character whose value is unknown or unrepresentable in Unicode
Maybe the answers to these might help get a better answer:
- What character are you expecting at that place?
- Can you post the URL that you're scraping?
- Is that the character on that page also or is it getting mangled when picked up by YQL?
Update
You might want to check out the charset
option in the where
clause of your YQL query - I'm not entirely sure what it does but it looks like it forces the YQL engine to use the specified charset when parsing the page. Perhaps setting it to UTF-8
will solve your problem.
For example,
select * from html where url = 'http://google.com' and charset='utf-8'
精彩评论