Encoding problem with reading website, three different encodings

2023-02-07 17:40 问答作者：

I have a problem with a WebRequest in C#. It's a google page.

The header states

text/html; charset=ISO-8859-1

The website states

<meta http-equiv=content-type content="text/html; charset=utf-8">

And finally I only get the expected Result in the debugger as well as regular expression, when I use Encoding.Default which defaults to System.Text.SBCSCodePageEncoding

Now what do I do? Do you have any hints, how this could happen or how I could solve this problem?

The actual Encoding of the page seems to be UTF-8. At least FF displays it correctly in UTF-8, not in Windows-Whatever and not in Latin1.

The URL is this

The problem is the €-sign as well as all German Umlauts.

Thanks in advance for your help on this problem which is making me seriously crazy!

Update: when I output the string via

// create a writer and open the file
TextWriter tw = new St开发者_Go百科reamWriter("test.txt");

// write a line of text to the file
tw.WriteLine(html);

// close the stream
tw.Close();

it works all fine.

So it seems the problem is, that the debugger does not show the correct encoding, and the Regular Expression also.

How do I tell C# to handle the RegEx as UTF-8?

Rather than parsing HTML, why not use the Google Query API?

BTW, before parsing HTML using regexes, read this ;-)

EDIT: In answer to your comment:

The API works for Google Desktop as well.
Is this encoding issue specific to the Google page?
In addition to the problem you have now, who knows what problem you'll run into later, when in production, due to subtle changes in the HTML of these pages, or in the header sent back by the Web server. A web page is supposed to be human eye-friendly, not computer friendly. The only thing you can expect to be friendly is the appearance and rendered contents of the page, not the underlying HTML structure. As opposed to an API, which is supposed to be computer-friendly.

继续阅读：encoding httpwebrequest latin1 utf-8

Encoding problem with reading website, three different encodings

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？