Random record in ActiveRecord
I'm in need of getting a random record from a table via ActiveRecord. I've followed the example from Jamis Buck from 2006.
However, I've also come across another way via a Google search (can't attribute with a link due to new user restrictions):
rand_id = rand(Model.count)
rand_record = Model.first(:conditions开发者_运维问答 => ["id >= ?", rand_id])
I'm curious how others on here have done it or if anyone knows what way would be more efficient.
Rails 6
As stated by Jason in the comments, in Rails 6, non-attribute arguments are not allowed. You must wrap the value in an Arel.sql()
statement.
Model.order(Arel.sql('RANDOM()')).first
Rails 5, 4
In Rails 4 and 5, using Postgresql or SQLite, using RANDOM()
:
Model.order('RANDOM()').first
Presumably the same would work for MySQL with RAND()
Model.order('RAND()').first
This is about 2.5 times faster than the approach in the accepted answer.
Caveat: This is slow for large datasets with millions of records, so you might want to add a limit
clause.
I haven't found an ideal way to do this without at least two queries.
The following uses a randomly generated number (up to the current record count) as an offset.
offset = rand(Model.count)
# Rails 4
rand_record = Model.offset(offset).first
# Rails 3
rand_record = Model.first(:offset => offset)
To be honest, I've just been using ORDER BY RAND() or RANDOM() (depending on the database). It's not a performance issue if you don't have a performance issue.
Your example code will start to behave inaccurately once records are deleted (it will unfairly favor items with lower ids)
You're probably better off using the random methods within your database. These vary depending on which DB you're using, but :order => "RAND()" works for mysql and :order => "RANDOM()" works for postgres
Model.first(:order => "RANDOM()") # postgres example
Benchmarking these two methods on MySQL 5.1.49, Ruby 1.9.2p180 on a products table with +5million records:
def random1
rand_id = rand(Product.count)
rand_record = Product.first(:conditions => [ "id >= ?", rand_id])
end
def random2
if (c = Product.count) != 0
Product.find(:first, :offset =>rand(c))
end
end
n = 10
Benchmark.bm(7) do |x|
x.report("next id:") { n.times {|i| random1 } }
x.report("offset:") { n.times {|i| random2 } }
end
user system total real
next id: 0.040000 0.000000 0.040000 ( 0.225149)
offset : 0.020000 0.000000 0.020000 ( 35.234383)
Offset in MySQL appears to be much slower.
EDIT I also tried
Product.first(:order => "RAND()")
But I had to kill it after ~60 seconds. MySQL was "Copying to tmp table on disk". That's not going to work.
It is not advised that you use this solution, but if for some reason you really want to randomly select a record while only making one database query, you could use the sample
method from the Ruby Array class, which allows you to select a random item from an array.
Model.all.sample
This method requires only one database query, but it's significantly slower than alternatives like Model.offset(rand(Model.count)).first
which require two database queries, though the latter is still preferred.
It doesn't have to be that hard.
ids = Model.pluck(:id)
random_model = Model.find(ids.sample)
pluck
returns an array of all the id's in the table. The sample
method on the array, returns a random id from the array.
This should perform well, with equal probability of selection and support for tables with deleted rows. You can even mix it with constraints.
User.where(favorite_day: "Friday").pluck(:id)
And thereby pick a random user who likes fridays rather than just any user.
I made a rails 3 gem to handle this:
https://github.com/spilliton/randumb
It allows you do do stuff like this:
Model.where(:column => "value").random(10)
Reading all of these did not give me a lot of confidence about which of these would work best in my particular situation with Rails 5 and MySQL/Maria 5.5. So I tested some of the answers on ~ 65000 records, and have two take aways:
- RAND() with a
limit
is a clear winner. - Do not use
pluck
+sample
.
def random1
Model.find(rand((Model.last.id + 1)))
end
def random2
Model.order("RAND()").limit(1)
end
def random3
Model.pluck(:id).sample
end
n = 100
Benchmark.bm(7) do |x|
x.report("find:") { n.times {|i| random1 } }
x.report("order:") { n.times {|i| random2 } }
x.report("pluck:") { n.times {|i| random3 } }
end
user system total real
find: 0.090000 0.000000 0.090000 ( 0.127585)
order: 0.000000 0.000000 0.000000 ( 0.002095)
pluck: 6.150000 0.000000 6.150000 ( 8.292074)
This answer synthesizes, validates and updates Mohamed's answer, as well as Nami WANG's comment on the same and Florian Pilz's comment on the accepted answer - please send up votes to them!
I use this so often from the console I extend ActiveRecord in an initializer - Rails 4 example:
class ActiveRecord::Base
def self.random
self.limit(1).offset(rand(self.count)).first
end
end
I can then call Foo.random
to bring back a random record.
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
All other answers perform badly with large database, except this gem:
- quick_random_records only cost
4.6ms
totally.
- the
User.order('RAND()').limit(10)
cost733.0ms
.
- the accepted answer
offset
approach cost245.4ms
totally.
- the
User.all.sample(10)
approach cost573.4ms
.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
One query in Postgres:
User.order('RANDOM()').limit(3).to_sql # Postgres example
=> "SELECT "users".* FROM "users" ORDER BY RANDOM() LIMIT 3"
Using an offset, two queries:
offset = rand(User.count) # returns an integer between 0 and (User.count - 1)
Model.offset(offset).limit(1)
You can use the Array
method sample
, the method sample
returns a random object from an array, in order to use it you just need to exec in a simple ActiveRecord
query that return a collection, for example:
User.all.sample
will return something like this:
#<User id: 25, name: "John Doe", email: "admin@example.info", created_at: "2018-04-16 19:31:12", updated_at: "2018-04-16 19:31:12">
If you need to select some random results within specified scope:
scope :male_names, -> { where(sex: 'm') }
number_of_results = 10
rand = Names.male_names.pluck(:id).sample(number_of_results)
Names.where(id: rand)
After seeing so many answers I decided to benchmark them all on my PostgreSQL(9.6.3) database. I use a smaller 100,000 table and got rid of the Model.order("RANDOM()").first since it was already two orders of magnitude slower.
Using a table with 2,500,000 entries with 10 columns the hands down winner was the pluck method being almost 8 times faster than the runner up(offset. I only ran this on a local server so that number might be inflated but its bigger enough that the pluck method is what I'll end up using. It's also worth noting that this might cause issues is you pluck more than 1 result at a time since each one of those will be unique aka less random.
Pluck wins running 100 time on my 25,000,000 row table Edit: actually this time includes the pluck in the loop if I take it out it it runs about as fast as simple iteration on the id. However; it does take up a fair amount of RAM.
RandomModel user system total real
Model.find_by(id: i) 0.050000 0.010000 0.060000 ( 0.059878)
Model.offset(rand(offset)) 0.030000 0.000000 0.030000 ( 55.282410)
Model.find(ids.sample) 6.450000 0.050000 6.500000 ( 7.902458)
Here is the data running 2000 times on my 100,000 row table to rule out random
RandomModel user system total real
find_by:iterate 0.010000 0.000000 0.010000 ( 0.006973)
offset 0.000000 0.000000 0.000000 ( 0.132614)
"RANDOM()" 0.000000 0.000000 0.000000 ( 24.645371)
pluck 0.110000 0.020000 0.130000 ( 0.175932)
Very old question but with :
rand_record = Model.all.shuffle
You got an Array of record, sort by random order. No need gems or scripts.
If you want one record :
rand_record = Model.all.shuffle.first
The Ruby method for randomly picking an item from a list is sample
. Wanting to create an efficient sample
for ActiveRecord, and based on the previous answers, I used:
module ActiveRecord
class Base
def self.sample
offset(rand(size)).first
end
end
end
I put this in lib/ext/sample.rb
and then load it with this in config/initializers/monkey_patches.rb
:
Dir[Rails.root.join('lib/ext/*.rb')].each { |file| require file }
This will be one query if the size of the model is already cached and two otherwise.
Rails 4.2 and Oracle:
For oracle you can set a scope on your Model like so:
scope :random_order, -> {order('DBMS_RANDOM.RANDOM')}
or
scope :random_order, -> {order('DBMS_RANDOM.VALUE')}
And then for a sample call it like this:
Model.random_order.take(10)
or
Model.random_order.limit(5)
of course you could also place an order without a scope like so:
Model.all.order('DBMS_RANDOM.RANDOM') # or DBMS_RANDOM.VALUE respectively
For MySQL database try: Model.order("RAND()").first
If you're using PostgreSQL 9.5+, you can take advantage of TABLESAMPLE
to select a random record.
The two default sampling methods (SYSTEM
and BERNOULLI
) require that you specify the number of rows to return as a percentage of the total number of rows in the table.
-- Fetch 10% of the rows in the customers table.
SELECT * FROM customers TABLESAMPLE BERNOULLI(10);
This requires knowing the amount of records in the table to select the appropriate percentage, which may not be easy to find quickly. Fortunately, there is the tsm_system_rows
module that allows you to specify the number of rows to return directly.
CREATE EXTENSION tsm_system_rows;
-- Fetch a single row from the customers table.
SELECT * FROM customers TABLESAMPLE SYSTEM_ROWS(1);
To use this within ActiveRecord, first enable the extension within a migration:
class EnableTsmSystemRowsExtension < ActiveRecord::Migration[5.0]
def change
enable_extension "tsm_system_rows"
end
end
Then modify the from
clause of the query:
customer = Customer.from("customers TABLESAMPLE SYSTEM_ROWS(1)").first
I don't know if the SYSTEM_ROWS
sampling method will be entirely random or if it just returns the first row from a random page.
Most of this information was taken from a 2ndQuadrant blog post written by Gulcin Yildirim.
I'm brand new to RoR but I got this to work for me:
def random
@cards = Card.all.sort_by { rand }
end
It came from:
How to randomly sort (scramble) an array in Ruby?
I try this of Sam's example on my App using rails 4.2.8 of Benchmark( I put 1..Category.count for random, because if the random takes a 0 it will produce an error(ActiveRecord::RecordNotFound: Couldn't find Category with 'id'=0)) and the mine was:
def random1
2.4.1 :071?> Category.find(rand(1..Category.count))
2.4.1 :072?> end
=> :random1
2.4.1 :073 > def random2
2.4.1 :074?> Category.offset(rand(1..Category.count))
2.4.1 :075?> end
=> :random2
2.4.1 :076 > def random3
2.4.1 :077?> Category.offset(rand(1..Category.count)).limit(rand(1..3))
2.4.1 :078?> end
=> :random3
2.4.1 :079 > def random4
2.4.1 :080?> Category.pluck(rand(1..Category.count))
2.4.1 :081?>
2.4.1 :082 > end
=> :random4
2.4.1 :083 > n = 100
=> 100
2.4.1 :084 > Benchmark.bm(7) do |x|
2.4.1 :085 > x.report("find") { n.times {|i| random1 } }
2.4.1 :086?> x.report("offset") { n.times {|i| random2 } }
2.4.1 :087?> x.report("offset_limit") { n.times {|i| random3 } }
2.4.1 :088?> x.report("pluck") { n.times {|i| random4 } }
2.4.1 :089?> end
user system total real
find 0.070000 0.010000 0.080000 (0.118553)
offset 0.040000 0.010000 0.050000 (0.059276)
offset_limit 0.050000 0.000000 0.050000 (0.060849)
pluck 0.070000 0.020000 0.090000 (0.099065)
.order('RANDOM()').limit(limit)
looks neat but is slow for large tables because it needs to fetch and sort all rows even if limit
is 1 (internally in database but not in Rails). I'm not sure about MySQL but this happens in Postgres. More explanation in here and here.
One solution for large tables is .from("products TABLESAMPLE SYSTEM(0.5)")
where 0.5
means 0.5%
. However, I find this solution is still slow if you have WHERE
conditions that filter out a lot of rows. I guess it's because TABLESAMPLE SYSTEM(0.5)
fetch all rows before WHERE
conditions apply.
Another solution for large tables (but not very random) is:
products_scope.limit(sample_size).sample(limit)
where sample_size
can be 100
(but not too large otherwise it's slow and consumes a lot of memory), and limit
can be 1
. Note that although this is fast but it's not really random, it's random within sample_size
records only.
PS: Benchmark results in answers above are not reliable (at least in Postgres) because some DB queries running at 2nd time can be significantly faster than running at 1st time, thanks to DB cache. And unfortunately there is no easy way to disable cache in Postgres to make these benchmarks reliable.
Along with using RANDOM()
, you can also throw this into a scope:
class Thing
scope :random, -> (limit = 1) {
order('RANDOM()').
limit(limit)
}
end
Or, if you don't fancy that as a scope, just throw it into a class method. Now Thing.random
works along with Thing.random(n)
.
You can get array of all ids and then return random element with sample method.
Model.ids.sample
If you want to run benchmarks on your database of choice, here is a template:
gem 'activerecord', git: 'https://github.com/rails/rails'
gem 'sqlite3'
gem 'benchmark'
require 'active_record'
require 'benchmark'
ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: ':memory:')
ActiveRecord::Schema.define do
create_table :users
end
class User < ActiveRecord::Base
def self.sample_random
order('RANDOM()').first
end
def self.sample_pluck_id_sample
find(pluck(:id).sample)
end
def self.sample_all_sample
all.sample
end
def self.sample_offset_rand_count
offset(rand(count)).first
end
end
USERS_COUNTS = [1000, 10_000, 100_000, 1_000_000]
N = 100
USERS_COUNTS.each do |count|
puts "Creating #{count} users"
User.insert_all((1..count).map { |id| { id: id } })
Benchmark.bm do |x|
x.report("sample_random") { N.times { User.sample_random } }
x.report("sample_offset_rand_count") { N.times { User.sample_offset_rand_count } }
if count < 10_000
x.report("sample_pluck_id_sample") { N.times { User.sample_pluck_id_sample } }
x.report("sample_all_sample") { N.times { User.sample_all_sample } }
end
end
puts "Deleting #{User.count} users"
User.delete_all
end
Random sorting by RDBMS can be quite expensive if there are millions of records. To simplify this, you can limit the number of sorted records such way (PostgreSQL syntax):
class ApplicationRecord < ActiveRecord::Base
def self.sample
where(
"id >= TRUNC(RANDOM() * (SELECT MAX(id) FROM #{table_name}) + 1)"
).order(:id).first
end
end
And then User.sample
This will work more randomly in the case of ids uniform distribution
What about writing:
rand_record = Model.find(Model.pluck(:id).sample)
This make what you're doing clear.
Depending of the meaning of "random" and what you actually want to do, take
could be enough.
By the "meaning" of random I mean:
- Do you mean give me any element I don't care it's position? then it is enough.
- Now, if you mean "give me any element with a fair probability that repeated experiments will give me different elements from the set" then, force the "Luck" with any of the methods mentioned in the other answers.
Example, for testing, sample data could have been created randomly anyways, so take
is more than enough, and to be honest, even first
.
https://guides.rubyonrails.org/active_record_querying.html#take
精彩评论