Download UTF-8 web page into String
This is a newbie question.
I read the following question to download a web page whose c开发者_如何学编程ontents is coded in UTF-8. The page is then converted into a byte array, while I'm using a String to read contents from the page.
I need to turn UTF-8 into Latin1/ANSI since that's what RichText and MessageBox seem to use (I'm getting funny characters).
Is there a more direct way to donwload a UTF-8 page and convert it into ANSI/Latin1?
Thank you.
Edit: When callig MessageBox, accented characters are not shown as expected:
Content = CStr(e.Result)
'Théâtre, Métro MessageBox.Show(Content)
String
in .NET uses unicode all the way, so you should not have to convert it to something. The important thing is that when you download the page, you need to make sure that you mark that you load the data from a UTF-8 source.
MSDN has a sample on loading UTF-8 encoded data into a string:
Private Function ReadAuthor(binary_file As Stream) As String
Dim encoding As System.Text.Encoding = System.Text.Encoding.UTF8
' Read string from binary file with UTF8 encoding
Dim buffer(30) As Byte
binary_file.Read(buffer, 0, 30)
Return encoding.GetString(buffer)
End Function
Update
When using WebClient.DownloadString
the conversion to a string takes place automatically and code similar to the one above is not needed. The automatic conversion uses the encoding specified by WebClient.Encoding
, so the problem should be solved by setting the WebClient object's encoding property to UTF-8:
client.Encoding = System.Text.Encoding.UTF8
精彩评论