nokogiri doc.xpath() problem
when looping开发者_如何学编程 through many web pages and calling something simple like below
manyhtmlpages.each do |page|
doc = Nokogiri::HTML(page)
puts doc.xpath("/html/body/h2[1]","/html/body/a[1]").to_s
end
i observe that memory consumption continually goes up until the script terminates due to running out of memory.
when i remove the doc.xpath bit, this error above is not experienced.
I think the root of the problem lies in that the code is not garbage collected until both page and doc leaves the scope (correct me if I'm wrong).
A similar problem is described here.
This is a problem with libxml-ruby, but as far as I know, nokogiri actually build on libxml.
I'm sorry, but I don't know the exact details about this problem. It's just to point you in the right direction.
精彩评论