Why is this code for iterating over the DOM stupid slow?
This is nested about 10 functions deep, so I'll just paste the relevant bits:
This line is really slow:
var nodes = Filter_Chunk(Traverse(), chunks.First());
Specifically, this chunk inside Filter_Chunk
(pun not intended):
private static IEnumerable<HtmlNode> Filter_Chunk(IEnumerable<HtmlNode> nodes, string selectorChunk)
{
// ...
string tagName = selectorChunk;
foreach (var node in nodes)
if (node.Name == tagName)
yield return node;
There's nothing too complicated in there... so I'm thinking it must be the sheer number of nodes in Traverse()
right?
public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
yield return node;
foreach (var child in Children().Traverse())
yield return child;
}
}
public SharpQuery Children()
{
return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}
I tried finding <h3>
nodes on stackoverflow.com. There 开发者_高级运维shouldn't be more than a couple thousand nodes, should there? Why is this taking many minutes to complete?
Actually, there's definitely a bug in here somewhere that is causing it to return more nodes than it should... I forked the question to address the issue
public IEnumerable<HtmlNode> Traverse()
{
foreach (var node in _context)
{
yield return node;
foreach (var child in Children().Traverse())
yield return child;
}
}
This code looks strange to me. Children()
is independent for _context
, so it makes no sense to run over the children one time for each node in _context.
精彩评论