Finding all the users that have duplicate names
I have users which has first_name and last_name fields and i need to do a ruby find all the users that have duplicate accounts based on first and last names. For example i want to have a find that will search through all the other users and find if any have th开发者_开发百科e same name and email. I was thinking a nested loop like this
User.all.each do |user|
//maybe another loop to search through all the users and maybe if a match occurs put that user in an array
end
Is there a better way
You could go a long way toward narrowing down your search by finding out what the duplicated data is in the first place. For example, say you want to find each combination of first name and email that is used more than once.
User.find(:all, :group => [:first, :email], :having => "count(*) > 1" )
That will return an array containing one of each of the duplicated records. From that, say one of the returned users had "Fred" and "fred@example.com" then you could search for only Users having those values to find all of the affected users.
The return from that find
will be something like the following. Note that the array only contains a single record from each set of duplicated users.
[#<User id: 3, first: "foo", last: "barney", email: "foo@example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">,
#<User id: 5, first: "foo1", last: "baasdasdr", email: "abc@example.com", created_at: "2010-12-30 17:20:49", updated_at: "2010-12-30 17:20:49">]
For example, the first element in that array shows one user with "foo" and "foo@example.com". The rest of them can be pulled out of the database as needed with a find.
> User.find(:all, :conditions => {:email => "foo@example.com", :first => "foo"})
=> [#<User id: 1, first: "foo", last: "bar", email: "foo@example.com", created_at: "2010-12-30 17:14:28", updated_at: "2010-12-30 17:14:28">,
#<User id: 3, first: "foo", last: "barney", email: "foo@example.com", created_at: "2010-12-30 17:14:43", updated_at: "2010-12-30 17:14:43">]
And it also seems like you'll want to add some better validation to your code to prevent duplicates in the future.
Edit:
If you need to use the big hammer of find_by_sql
, because Rails 2.2 and earlier didn't support :having
with find
, the following should work and give you the same array that I described above.
User.find_by_sql("select * from users group by first,email having count(*) > 1")
After some googling, I ended up with this:
ActiveRecord::Base.connection.execute(<<-SQL).to_a
SELECT
variants.id, variants.variant_no, variants.state
FROM variants INNER JOIN (
SELECT
variant_no, state, COUNT(1) AS count
FROM variants
GROUP BY
variant_no, state HAVING COUNT(1) > 1
) tt ON
variants.variant_no = tt.variant_no
AND variants.state IS NOT DISTINCT FROM tt.state;
SQL
Note that part that says IS NOT DISTINCT FROM
, this is to help deal with NULL
values, which can't be compared with equals sign in postgres.
If you are going the route of @hakunin and creating a query manually, you may wish to use the following:
ActiveRecord::Base.connection.exec_quey(<<-SQL).to_a
SELECT
variants.id, variants.variant_no, variants.state
FROM variants INNER JOIN (
SELECT
variant_no, state, COUNT(1) AS count
FROM variants
GROUP BY
variant_no, state HAVING COUNT(1) > 1
) tt ON
variants.variant_no = tt.variant_no
AND variants.state IS NOT DISTINCT FROM tt.state;
SQL
The change is replacing connection.execute(<<-SQL)
with connection.exec_query(<<-SQL)
There can be a problem with memory leakage using execute
Plead read Clarify DataBaseStatements#execute to get an in depth understanding of the problem.
精彩评论