开发者

How do you efficiently (in a DB independent manner) select random records from a table?

This seems like an incredibly simple problem however it isn't working out as trivially as I'd expected.

I have a club which has club members and I'd like to pull out two members at random from a club.

Using RANDOM()

One way is to use random ordering:

club.members.find(:all, :order => 'RANDOM()').limit(2)

However that is different for SqLite (the dev database) and Postgres (production) since in MySql the command is RAND().

While I could start writing some 开发者_运维知识库wrappers around this I feel that the fact that it hasn't been done already and doesn't seem to be part of ActiveRecord tells me something and that RANDOM may not be the right way to go.

Pulling items out directly using their index

Another way of doing this is to pull the set in order but then select random records from it:

First off we need to generate a sequence of two unique indices corresponding to the members:

all_indices = 1..club.members.count
two_rand_indices = all_indices.to_a.shuffle.slice(0,2)

This gives an array with two indices guaranteed to be unique and random. We can use these indices to pull out our records

@user1, @user2 = Club.members.values_at(*two_rand_indices)

What's the best method?

While the second method is seems pretty nice, I also feel like I might be missing something and might have over complicated a simple problem. I'm clearly not the first person to have tackled this so what is the best, most SQL efficient route through it?


The problem with your first method is that it sorts the whole table by an unindexable expression, just to take two rows. This does not scale well.

The problem with your second method is similar, if you have 109 rows in your table, then you will generate a large array from to_a. That will take a lot of memory and time to shuffle it.

Also by using values_at aren't you assuming that there's a row for every primary key value from 1 to count, with no gaps? You shouldn't assume that.

What I'd recommend instead is:

  1. Count the rows in the table.

    c = Club.members.count
    
  2. Pick two random numbers between 1 and the count.

    r_a = 2.times.map{ 1+Random.rand(c) }
    
  3. Query your table with limit and offset.
    Don't use ORDER BY, just rely on the RDBMS's arbitrary ordering.

    for r in r_a
        row = Club.members.limit(1).offset(r)
    end
    

See also:

  • How can i optimize MySQL's ORDER BY RAND() function?
  • Quick selection of a random row from a large table in MySQL


The Order By RAND() function in MySQL:

ORDER BY RAND() LIMIT 4

This will select a random 4 rows when the above is the final clause in the query.


try to use the randumb gem, it implement the second method you mentioned

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜