开发者

Convert non-English text into readable format

I am obtaining strings from the web which often contain accented characters not recognised from within my application.

Edit - I'm obtaining my string using the HtmlAgilityPack. I am taking the InnerText of a <title> tag. Whilst doing this the Pack uses a different encoding from the original HTML document (I'm not sure which ones though?).

        // get the html title inner text and ass开发者_JAVA百科ign to htmlParts object
        HtmlNode titleNode = doc.DocumentNode.SelectSingleNode("//title");
        string docTitle = titleNode.InnerText;
        htmlParts.htmlTitle = docTitle.ToString();

Can anyone tell me how I can go from getting "(Subtitulado al español).avi" to "(Subtitulado al español).avi" ?

I'd very much appreciate it. :)


It looks like you're getting UTF-8, but processing it as ISO-8859-1.

It's not possible to give more concrete information without knowing more about your system.


apply proper encoding to the data you read. How exactly? Good question. For that you at least need to provide the code that causes the problem in the first place.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜