开发者

Ruby/Mechanize "failed to allocate memory". Erasing instantiation of 'agent.get' method?

I've got a little problem about leaking memory in a Mechanize Ruby script.

I "while loop" multiple web pages access forever and memory increase a lot on each loop. That created a "failed to allocate memory" after minutes and made script exit.

In fact, it seems that the agent.get method instantiate and hold the result even if I assign the result to the same "local variable" or even a "global variable". So I tried to assign nil to the variable after last used and before reusing the same name variable. But it seems that previous agent.get results are still in memory and really don't know how to drain RAM to make my script using a roughly stable amount of memory after hours?

Here are two peace of code : (stay on "enter" key and see the Ruby allocated RAM growing)

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
GC.enable
#puts GC.malloc_allocations
while gets.chomp!="stop"
    page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    page = agent.get 'http://www.nypost.com/'
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    puts local_variables
    GC.start
    puts local_variables
    #puts GC.malloc_allocations
end

And with global variable instead :

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
while gets.chomp!="stop"
    $page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "$page.object_id  : "+$page.object_id.to_s
    $page = agent.get 'http://www.nypost.com/'
    puts "$page.object_id  : "+$page.object_id.to_s
    #puts local_variables
    #puts global_variables
end

In other languages the variable is re-affected and allocated memory stay stable. why ruby doesn't? How can I force instances to garbage?

Edit : Here is an other example using Object as Ruby is an Object Oriented language but result is exactly the same : memory grow again and again...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            remove_instance_variable(:@page)
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

My Answer (not enough reputation to do it properly)

Ok so !

It seems that Mechanize::History.clear greatly solves this problem of memory leak.

here is the last Ruby code modified if you want to test before and after...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    e开发者_运维知识库nd
end
myPage = GetContent.new('http://www.nypost.com/')


My suggestion is setting agent.max_history = 0. As mentioned in the list of linked issues.

This will keep a history entry from even being added, instead of using #clear.

Here is the modified version of the other answer

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
$agent.max_history = 0
class GetContent
    def initialize url
        while true
            @page = $agent.get url
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')


Ok so ! (had enough reputation to answer my owns questions properly)

It seems that Mechanize::History.clear greatly solves this problem of memory leak.

here is the last Ruby code modified if you want to test before and after...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜