Is .NET WebClient grabbing text wrong?
I thought i had this unicode thing down. Then i realize i wasnt saving/parsing some text correctly.
Heres the text (sorry, i dont know what it says. A random user wrote it)
Here is simple test code开发者_如何学JAVA. Essentially i write the BOM for utf-8 unicode and write the source. That didnt work and for sanity reasons i tried saving the file directly (second piece of code). Both got the WRONG text in file and using multiple browsers it showed me incorrect text.
Why and how do i fix it?
Note: With my first code i can see sz holding the same incorrect text using visual studios.
using System;
using System.IO;
using System.Net;
using System.Text;
namespace unicode_stuff
{
class Program
{
static void Main(string[] args)
{
var wc = new WebClient();
var fn = "out.html";
var sw = new StreamWriter(fn, false, Encoding.UTF8);
var sz = wc.DownloadString("http://www.pastie.org/pastes/1703099/text");
sw.WriteLine(sz);
sw.Close();
}
}
}
second
using System;
using System.IO;
using System.Net;
namespace unicode_stuff
{
class Program
{
static void Main(string[] args)
{
var wc = new WebClient();
var fn = "out.html";
wc.DownloadFile("http://www.pastie.org/pastes/1703099/text", fn);
}
}
}
try setting your encoding to UTF8 before loading:
wc.Encoding = System.Text.Encoding.UTF8;
精彩评论