Ruby/Mechanize "failed to allocate memory". Erasing instantiation of 'agent.get' method?
I've got a little problem about leaking memory in a Mechanize Ruby script.
I "while loop" multiple web pages access forever and memory increase a lot on each loop. That created a "failed to allocate memory" after minutes and made script exit.
In fact, it seems that the agent.get
method instantiate and hold the result even if I assign the result to the same "local variable" or even a "global variable".
So I tried to assign nil
to the variable after last used and before reusing the same name variable. But it seems that previous agent.get
results are still in memory and really don't know how to drain RAM to make my script using a roughly stable amount of memory after hours?
Here are two peace of code : (stay on "enter" key and see the Ruby allocated RAM growing)
#!/usr/bin/env ruby
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
GC.enable
#puts GC.malloc_allocations
while gets.chomp!="stop"
page = agent.get 'http://www.nypost.com/'
puts "agent.object_id : "+agent.object_id.to_s
puts "page.object_id : "+page.object_id.to_s
page=nil
puts "page.object_id : "+page.object_id.to_s
page = agent.get 'http://www.nypost.com/'
puts "page.object_id : "+page.object_id.to_s
page=nil
puts "page.object_id : "+page.object_id.to_s
puts local_variables
GC.start
puts local_variables
#puts GC.malloc_allocations
end
And with global variable instead :
#!/usr/bin/env ruby
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
while gets.chomp!="stop"
$page = agent.get 'http://www.nypost.com/'
puts "agent.object_id : "+agent.object_id.to_s
puts "$page.object_id : "+$page.object_id.to_s
$page = agent.get 'http://www.nypost.com/'
puts "$page.object_id : "+$page.object_id.to_s
#puts local_variables
#puts global_variables
end
In other languages the variable is re-affected and allocated memory stay stable. why ruby doesn't? How can I force instances to garbage?
Edit : Here is an other example using Object as Ruby is an Object Oriented language but result is exactly the same : memory grow again and again...
#!/usr/bin/env ruby
require 'mechanize'
$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
def initialize url
while true
@page = $agent.get url
remove_instance_variable(:@page)
end
end
end
myPage = GetContent.new('http://www.nypost.com/')
My Answer (not enough reputation to do it properly)
Ok so !
It seems that Mechanize::History.clear
greatly solves this problem of memory leak.
here is the last Ruby code modified if you want to test before and after...
#!/usr/bin/env ruby
require 'mechanize'
$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
def initialize url
while true
@page = $agent.get url
$agent.history.clear
end
e开发者_运维知识库nd
end
myPage = GetContent.new('http://www.nypost.com/')
My suggestion is setting agent.max_history = 0. As mentioned in the list of linked issues.
This will keep a history entry from even being added, instead of using #clear.
Here is the modified version of the other answer
#!/usr/bin/env ruby
require 'mechanize'
$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
$agent.max_history = 0
class GetContent
def initialize url
while true
@page = $agent.get url
end
end
end
myPage = GetContent.new('http://www.nypost.com/')
Ok so ! (had enough reputation to answer my owns questions properly)
It seems that Mechanize::History.clear
greatly solves this problem of memory leak.
here is the last Ruby code modified if you want to test before and after...
#!/usr/bin/env ruby
require 'mechanize'
$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
def initialize url
while true
@page = $agent.get url
$agent.history.clear
end
end
end
myPage = GetContent.new('http://www.nypost.com/')
精彩评论