开发者

HtmlDocument.Write Stripping Quotation Marks

For some reason when I try writing to an HtmlDocument it strips some (not all) of the quotation marks of the string I am giving it.

Look here:

HtmlDocument htmlDoc = Webbrowser1.Document.OpenNew(true);
htmlDoc.Write("<HTML><BODY><DIV ID=\"TEST\"></DIV></BODY></HTML>");
string temp = htmlDoc.GetElementsByTagName("HTML")[0].InnerHtml;

The result of temp is this:

<HEAD></HEAD>
<BODY>
<DIV id=TEST></DIV></BODY>

It works exactly as it should except it is stripping the quotat开发者_开发问答ion marks. Does anyone have a solution on how to prevent or fix this?


There is no guarantees with innerHTML that it will return content identical to string you passed in. The innerHTML is constructed by browser using its HTML tree representation - so it will produce resulting string as it see fits.

So depending on your needs you can try to use some HTML parsing code that understands ID's without quotes around OR try to convince browser to use latest engine which more likely to produce innerHTML to you liking.

I.e. in your case it looks like at least IE9 renders your HTML as IE9:Quirks mode (that returns innerHTML in the shape your are not happy with), if you make valid HTML or force mode to IE9:Standard you'll get string with qoutes like

document.getElementsByTagName("html")[0].innerHTML 

IE9:Standards - "<head></head><body><div id="TEST"></div></body>"

IE9:Quirks -

"<HEAD></HEAD>
<BODY>
<DIV id=TEST></DIV></BODY>" 

You can try it yourself by creating sample HTML file and opening from disk. F12 to show dev tools and check out mode in the menu bar.


C# has a quirky feature though I'm not sure of it's name. Sorry i'm not sure of a vb equivalent.

Add an @ at the beginning of a literal string to escape all characters.

htmlDoc.Write(@"<HTML><BODY><DIV ID="TEST"></DIV></BODY></HTML>");

Also, this isn't important but your html would not validate. All tags and attributes should be lower case. E.g.<HTML> should be <html>.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜