开发者

Using Ruby, How to pass collection to method, but a subset of it?

I have a collection of users:

users = User.all()

I want to pass a subset of the user collection to a method. Each subset should contain 1000 items (or less on the last iteration).

some_method(users)

So say users has 9500 items in it, I want to开发者_如何学Python call some_method 10 times, 9 times passing 1000 items and the last time 500.


You can use Enumerable#each_slice method:

User.all.each_slice(1000) do |subarray|
  some_method subarray
end

but that would first pull all the records from the database.

However, I guess you could make something like this:

def ar_each_slice scope, size
  (scope.count.to_f / size).ceil.times do |i|
    yield scope.scoped(:offset => i*size, :limit => size)
  end
end

and use it as in:

ar_each_slice(User.scoped, 1000) do |slice|
  some_method slice.all
end

It will first get the number of records (using COUNT), and then get 1000 by 1000 using LIMIT clause and pass it to your block.


Since Rails 2.3 one can specify batch_size:

User.find_in_batches(:batch_size =>1000) do |users|
    some_method(users)
end

In this case, framework will run select query for every 1000 records. It keeps memory low if you are processing large number of records.


I think, you should divide into subset manually. For example,

some_method(users[0..999])


I forgot about using :batch_size but Chandra suggested it. That's the right way to go.


Using .all will ask the database to retrieve all records, passing them to Ruby to hold then iterate over them internally. That is a really bad way to handle it if your database will be growing. That's because the glob of records will make the DBM work harder as it grows, and Ruby will have to allocate more and more space to hold them. Your response time will grow as a result.

A better solution is to use the :limit and :offset options to tell the DBM to successively find the first 1000 records at offset 0, then the next 1000 records at offset 1, etc. Keep looping until there are no more records.

You can determine how many times you'll have to loop by doing a .count before you begin asking, which is really fast unless your where-clause is beastly, or simply loop until you get no records back.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜