开发者

Iteration through the HtmlDocument.All collection stops at the referenced stylesheet?

Since "bug in .NET" is often not the real cause of a problem, I wonder if I'm missing something here.

What I'm doing feels pretty simple. I'm iterating through the elements in a HtmlDocument called doc like this:

System.Diagnostics.Debug.WriteLine("*** " + doc.Url + " ***");
foreach (HtmlElement field in doc.All)
    System.Diagnostics.Debug.WriteLine(string.Format("Tag = {0}, ID = {1} ", field.TagName, field.Id));

I then discovered the debug window output was this:

Tag = !, ID =  
Tag = HTML, ID =  
Tag = HEAD, ID =  
Tag = TITLE, ID =  
Tag = LINK, ID =  

... when the actual HTML document looks like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
    <head>
        <title>Protocol</title>
        <link rel="Stylesheet" type="text/css" media="all" href="ProtocolStyle.css">
    </head>
    <body onselects开发者_如何学Pythontart="return false">
        <table>
            <!-- Misc. table elements and cell values -->
        </table>
    </body>
</html>

Commenting out the LINK tag solves the issue for me, and the document is completely parsed. The ProtocolStyle.css file exist on disk and is loaded properly, if that would matter. Is this a bug in .NET 3.5 SP1, or what? For being such a web-oriented framework, I find it hard to believe there would be such a major bug in it.

Update: By the way, this iteration was done in the WebBrowser control's Navigated event.


After a few years, I returned to this code and finally discovered that the problem was that I walked through the HtmlDocument.All collection in the WebBrowser.Navigated event handler. The proper way to do this is to walk through the elements in WebBrowser.DocumentCompleted.

This mistake also caused embedded script code to seemingly "halt" parsing, exactly like the aforementioned LINK tags. In reality, it wasn't halting -- it just hadn't finished rendering the entire document yet.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜