Encoding problem with C# utilising HttpWebRequest

2023-02-13 00:46 问答作者：

I am getting character codes (' and &quote;) that are breaking my responses (showing 39; and uto;) when returning a string from an HttpWebRequest:

internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
    try
    {
        string translated = null;
        HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
        HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
        StreamReader sr = new StreamReader(res.GetResponseStream());
        string html = sr.ReadToEnd();
        int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
        int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
        translated = html.Substring(a, b - a);
        if (translated.Length < (10 * text.Length)){
            if (player == Player.Con开发者_开发百科sole)
            {
                player.ParseMessage(translated, true);
            }
            else
            {
                player.ParseMessage(translated, false);
            }
        } else {
            player.Message("Usage: /translate [lang] [message]");
        }
    }
    catch
    {
        player.Message("Usage: /translate [lang] [message]");
    }
}

First of all make sure you get the correct encoding of the downloaded content. See this SO answer for code on how to do this.

Basically check both the http headers and the meta tags for the encoding, and re-encode the content if necessary. Then do a HttpUtility.HtmlDecode to get rid of any html coded characters. Now you are ready to start searching for whatever content you are trying to find.

I would also recommend using something like Html Agility Pack to parse the html instead of indexof.

It is hard to say what exactly does your ParseMessage method expect, so this is just a guess:

The result you are getting from Google Translate is in HTML. Which means if you want a plain text output, you have to convert the HTML to text. You have successfully (for now, at least, until Google Translate changes their output page a tiny bit; your solution is not exactly fool- or future-proof) extracted the translation from the HTML page. But the translation is still encoded in HTML and you need to decode it. For that, you can use the WebUtility.HtmlDecode method (assuming you are using .NET Framework 4): After the

translated = html.Substring(a, b - a);

line, add

translated = WebUtility.HtmlDecode(translated);

Discussions with another developer go me to try this before the last lot of comments. Here is what ended up working:

    internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
        try
        {
            string translated = null;
            text = Regex.Replace(text, @"[^\w\.\'\s@-]", "");
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");

            request.MaximumAutomaticRedirections = 4;
            request.MaximumResponseHeadersLength = 4;

            request.Credentials = CredentialCache.DefaultCredentials;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            Stream receiveStream = response.GetResponseStream();

            StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF7);
            String html = readStream.ReadToEnd() + "";
            int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
            int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
            translated = html.Substring(a, b - a);
            response.Close();
            readStream.Close();
            if (translated.Length < (10 * text.Length))
            {
                translated = translated.Replace("&#39", "'");
                translated = Regex.Replace(translated, @"[^\w\.\'\s@-]", "");
                if (player == Player.Console)
                {
                    player.ParseMessage(translated, true);
                }
                else
                {
                    player.ParseMessage(translated, false);
                }
            }
            else
            {
                player.Message("Usage: /translate [lang] [message]");
            }
        }
        catch(Exception ex)
        {
            player.Message("Error:" + ex.ToString());

        }
   }

继续阅读：httpwebrequest regex

Encoding problem with C# utilising HttpWebRequest

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？