Iteration through the HtmlDocument.All collection stops at the referenced stylesheet?
Since "bug in .NET" is often not the real cause of a problem, I wonder if I'm missing something here.
What I'm doing feels pretty simple. I'm iterating through the elements in a HtmlDocument
called doc
like this:
System.Diagnostics.Debug.WriteLine("*** " + doc.Url + " ***");
foreach (HtmlElement field in doc.All)
System.Diagnostics.Debug.WriteLine(string.Format("Tag = {0}, ID = {1} ", field.TagName, field.Id));
I then discovered the debug window output was this:
Tag = !, ID =
Tag = HTML, ID =
Tag = HEAD, ID =
Tag = TITLE, ID =
Tag = LINK, ID =
... when the actual HTML document looks like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Protocol</title>
<link rel="Stylesheet" type="text/css" media="all" href="ProtocolStyle.css">
</head>
<body onselects开发者_如何学Pythontart="return false">
<table>
<!-- Misc. table elements and cell values -->
</table>
</body>
</html>
Commenting out the LINK
tag solves the issue for me, and the document is completely parsed. The ProtocolStyle.css
file exist on disk and is loaded properly, if that would matter. Is this a bug in .NET 3.5 SP1, or what? For being such a web-oriented framework, I find it hard to believe there would be such a major bug in it.
Update: By the way, this iteration was done in the WebBrowser control's Navigated event.
After a few years, I returned to this code and finally discovered that the problem was that I walked through the HtmlDocument.All
collection in the WebBrowser.Navigated
event handler. The proper way to do this is to walk through the elements in WebBrowser.DocumentCompleted
.
This mistake also caused embedded script code to seemingly "halt" parsing, exactly like the aforementioned LINK
tags. In reality, it wasn't halting -- it just hadn't finished rendering the entire document yet.
精彩评论