开发者

Is .NET WebClient grabbing text wrong?

I thought i had this unicode thing down. Then i realize i wasnt saving/parsing some text correctly.

Heres the text (sorry, i dont know what it says. A random user wrote it)

Here is simple test code开发者_如何学JAVA. Essentially i write the BOM for utf-8 unicode and write the source. That didnt work and for sanity reasons i tried saving the file directly (second piece of code). Both got the WRONG text in file and using multiple browsers it showed me incorrect text.

Why and how do i fix it?

Note: With my first code i can see sz holding the same incorrect text using visual studios.

using System;
using System.IO;
using System.Net;
using System.Text;

namespace unicode_stuff
{
    class Program
    {
        static void Main(string[] args)
        {
            var wc = new WebClient();
            var fn = "out.html";
            var sw = new StreamWriter(fn, false, Encoding.UTF8);
            var sz = wc.DownloadString("http://www.pastie.org/pastes/1703099/text");
            sw.WriteLine(sz);
            sw.Close();
        }
    }
}

second

using System;
using System.IO;
using System.Net;

namespace unicode_stuff
{
    class Program
    {
        static void Main(string[] args)
        {
            var wc = new WebClient();
            var fn = "out.html";
            wc.DownloadFile("http://www.pastie.org/pastes/1703099/text", fn);
        }
    }
}


try setting your encoding to UTF8 before loading:

wc.Encoding = System.Text.Encoding.UTF8;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜