How do you efficiently (in a DB independent manner) select random records from a table?
This seems like an incredibly simple problem however it isn't working out as trivially as I'd expected.
I have a club which has club members and I'd like to pull out two members at random from a club.
Using RANDOM()
One way is to use random ordering:
club.members.find(:all, :order => 'RANDOM()').limit(2)
However that is different for SqLite (the dev database) and Postgres (production) since in MySql the command is RAND()
.
While I could start writing some 开发者_运维知识库wrappers around this I feel that the fact that it hasn't been done already and doesn't seem to be part of ActiveRecord tells me something and that RANDOM may not be the right way to go.
Pulling items out directly using their index
Another way of doing this is to pull the set in order but then select random records from it:
First off we need to generate a sequence of two unique indices corresponding to the members:
all_indices = 1..club.members.count
two_rand_indices = all_indices.to_a.shuffle.slice(0,2)
This gives an array with two indices guaranteed to be unique and random. We can use these indices to pull out our records
@user1, @user2 = Club.members.values_at(*two_rand_indices)
What's the best method?
While the second method is seems pretty nice, I also feel like I might be missing something and might have over complicated a simple problem. I'm clearly not the first person to have tackled this so what is the best, most SQL efficient route through it?
The problem with your first method is that it sorts the whole table by an unindexable expression, just to take two rows. This does not scale well.
The problem with your second method is similar, if you have 109 rows in your table, then you will generate a large array from to_a
. That will take a lot of memory and time to shuffle it.
Also by using values_at
aren't you assuming that there's a row for every primary key value from 1 to count, with no gaps? You shouldn't assume that.
What I'd recommend instead is:
Count the rows in the table.
c = Club.members.count
Pick two random numbers between 1 and the count.
r_a = 2.times.map{ 1+Random.rand(c) }
Query your table with limit and offset.
Don't useORDER BY
, just rely on the RDBMS's arbitrary ordering.for r in r_a row = Club.members.limit(1).offset(r) end
See also:
- How can i optimize MySQL's ORDER BY RAND() function?
- Quick selection of a random row from a large table in MySQL
The Order By RAND() function in MySQL:
ORDER BY RAND() LIMIT 4
This will select a random 4 rows when the above is the final clause in the query.
try to use the randumb gem, it implement the second method you mentioned
精彩评论