开发者

Why is this code for iterating over the DOM stupid slow?

This is nested about 10 functions deep, so I'll just paste the relevant bits:

This line is really slow:

var nodes = Filter_Chunk(Traverse(), chunks.First());

Specifically, this chunk inside Filter_Chunk (pun not intended):

private static IEnumerable<HtmlNode> Filter_Chunk(IEnumerable<HtmlNode> nodes, string selectorChunk)
{
    // ...
    string tagName = selectorChunk;
    foreach (var node in nodes)
        if (node.Name == tagName)
            yield return node;

There's nothing too complicated in there... so I'm thinking it must be the sheer number of nodes in Traverse() right?

public IEnumerable<HtmlNode> Traverse()
{
    foreach (var node in _context)
    {
        yield return node;
        foreach (var child in Children().Traverse())
            yield return child;
    }
}

public SharpQuery Children()
{
    return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}

I tried finding <h3> nodes on stackoverflow.com. There 开发者_高级运维shouldn't be more than a couple thousand nodes, should there? Why is this taking many minutes to complete?


Actually, there's definitely a bug in here somewhere that is causing it to return more nodes than it should... I forked the question to address the issue


public IEnumerable<HtmlNode> Traverse()
{
    foreach (var node in _context)
    {
        yield return node;
        foreach (var child in Children().Traverse())
            yield return child;
    }
}

This code looks strange to me. Children() is independent for _context, so it makes no sense to run over the children one time for each node in _context.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜