How can I prevent strange characters when pulling the atom feed from a wordpress 3.0 blog
I have an atom feed on a wordpress blog here: http://blogs.legalview.info/auto-accidents/feed/atom
When I download the text of the file and display it on my site, I get strange charactes like the accented 'A' here:
Recent studies are showing that car accident -related fatalities have declined almost 10% since 2008. The reason for this
I am using the following code in my C# web application to download the feed:
WebClient 开发者_如何学Cclient = new WebClient();
client.Headers.Add(@"Accept-Language: en-US,en
Accept-Charset: utf-8");
string xml_text = client.DownloadString(_atom_url);
And xml_text.Contains("Â")
returns true, but if I download the feed in my browser no such
 exists. I'm pretty sure this is a character set issue, but I can't figure out why. By examining client.ResponseHeaders
, I can see it is in fact downloading text in utf-8, and the response on my .Net site is UTF-8 as well, so I can't figure out why the weirdness appears
I get ...fatalitiesÂ
when I force my browser to interpret the feed as ISO-8859-1 instead of UTF-8 (which definitely is the correct character set for the feed.)
I'm pretty sure either your WebClient somehow defaults to ISO-8859-1, or the output encoding on your site is ISO-8859-1, which obviously garbles the UTF-8 input.
Maybe start checking your site's output first. If that definitely is UTF-8, take a look at the WebClient.
精彩评论