Ruby on Rails memory leak when looping through large number of records; find_each doesn't help
I have a Rails app that processes a large (millions) number of records in a mysql database. Once it starts working, its memory use quickly grows at a speed of 50MB per second. With tools like oink I was able to narrow down the root cause to one loop that goes through all the records in a big table in the database.
I understand if I use something like Person.all.each, all the records will be loaded into me开发者_StackOverflow社区mory. However if I switch to find_each, I still see the same memory issue. To further isolate the problem I created the following test controller, which does nothing but looping through the records. I suppose find_each only keeps a small number of objects in memory each time, but memory use grows linearly as it executes.
class TestController < ApplicationController
def memory_test
Person.find_each do |person|
end
end
I suspect it has to do with ActiveRecord caching the query results. But I checked my environment settings and I do have all the caching related options set to false in development (I am using the default settings created by rails). I did some search online but couldn't find a solution.
I am using rails 3.1.0 rc1 and ruby 1.9.2
Thanks!
I was able to figure this out myself. There are two places to change.
First, disable IdentityMap. In config/application.rb
config.active_record.identity_map = false
Second, use uncached to wrap up the loop
class MemoryTestController < ApplicationController
def go
ActiveRecord::Base.uncached do
Person.find_each do |person|
# whatever operation
end
end
end
end
Now my memory use is under control. Hope this helps other people.
find_each
calls find_in_batches
with a batch size of 1000 under the hood.
All the records in the batch will be created and retained in memory as long as the batch is being processed.
If your records are large or if they consume a lot of memory via proxy collections (e.g. has_many caches all of its items anytime you use it), you can also try a smaller batch size:
Person.find_each batch_size: 100 do |person|
# whatever operation
end
You can also try manually calling GC.start
periodically (e.g. every 300 items)
As nice as ActiveRecord is, it is not the best tool for all problems. I recommend dropping down to your native database adapter and doing the work at that level.
精彩评论