Rails: Preventing Duplicate Photo Uploads with Paperclip?
Is there anyway to throw a validation error if a user tries to upload the same photo twice to a Rails app using Paperclip? Paperclip doesn't seem to offer this functionality...
I'm using Rails 2.3.5 and Paperclip (obviously).
SOLUTION: (or one of them, at least)
Using Beerlington's suggestion, I decided to go with an MD5 Checksum comparison:
class Photo < ActiveRecord::Base
#...
has_attached_file :image #, ...
before_validation_on_create :generate_md5_checksum
validate :unique_photo
#...
def generate_md5_checksum
self.md5_checksum = Digest::MD5.hexdigest(image.to_file.read)
end
def unique_photo
photo_digest = self.md5_checksum
errors.add_to_base "You have already uploaded that file!" unless User.find(self.user_id).photos.find_by_md5_checksum(photo_digest).nil?
end
# ...
end
Then I just added a column to my photos
table called md5_checksum
, and voila! Now my app throws a validation error if you try to upload the same photo!开发者_StackOverflow社区
No idea how efficient/inefficient this is, so refactoring's welcome!
Thanks!
What about doing an MD5 on the image file? If it is the exact same file, the MD5 hash will be the same for both images.
For anyone else trying to do this. Paperclip now has md5 hashing built in. If you have a [attachment]_fingerprint in your model, paperclip will populate this with the MD5.
Since I already had a column named hash_value, I made a 'virtual' attribute called fingerprint
#Virtual attribute to have paperclip generate the md5
def picture_fingerprint
self.hash_value
end
def picture_fingerprint=(md5Hash)
self.hash_value=md5Hash
end
And, with rails3, using sexy_validations, I was able to simply add this to the top my my model to ensure that the hash_value is unique before it saves the model:
validates :hash_value, :uniqueness => { :message => "Image has already been uploaded." }
You might run into a problem when your images have amended EXIF metadata. This happened to me, and I had to extract pixel values and calculate MD5s out of them, to ignore changes made by Wordpress etc. You can read about it on our blog: http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ but essentially you want to get the pixel data out of image with some tool (like RMagick), concatinate it to string, and calculate MD5 out of that.
As Stephen indicated, your biggest issue is how to determine if a file is a duplicate, and there is no clear answer for this.
If these are photos taken with a digital camera, you would want to compare the EXIF data. If the EXIF data matches then the photo is most likely a duplicate. If it is a duplicate then you can inform the user of this. You'll have to accept the upload initially though so that you examine the EXIF data.
I should mention that EXIFR is a nice ruby gem for examining the EXIF data.
精彩评论