开发者

nokogiri doc.xpath() problem

when looping开发者_如何学编程 through many web pages and calling something simple like below

manyhtmlpages.each do |page|

doc = Nokogiri::HTML(page) 

puts doc.xpath("/html/body/h2[1]","/html/body/a[1]").to_s

end

i observe that memory consumption continually goes up until the script terminates due to running out of memory.

when i remove the doc.xpath bit, this error above is not experienced.


I think the root of the problem lies in that the code is not garbage collected until both page and doc leaves the scope (correct me if I'm wrong).

A similar problem is described here.
This is a problem with libxml-ruby, but as far as I know, nokogiri actually build on libxml.

I'm sorry, but I don't know the exact details about this problem. It's just to point you in the right direction.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜