Image comparison with php + gd
What's the best approach to comparing two images with php and the Graphic Draw (GD) Library?
This is the scenario:
I have an image, and I want to find which image of a given set is the most similar to it. The most similar image is in fact the same image, not pixel perfect match but the same image. I've dramatised the difference between the two images with the number one on the example just to ease the understanding of what I meant.
Even though it brought no consistent results, my approach was to reduce the images to 1px using the imagecopyresampled function and see how close the RGB values where between images.
The sum of the values of deducting each red, green and blue decimal equivalent value from the red, green and blue decimal equivalent value of the possible match gave me a dissimilarity index that, even though it didn't work as expected since not always the most RGB similar image was the target image, I could use to select an image from the available targets.
Here's a sample of the output when comparing 4 images against a target image, in this case the apple logo, that matches one of them but is not exactly the same:
Original image:
Red:222 Green:226 Blue:232Compared against:
http开发者_开发技巧://a1.twimg.com/profile_images/571171388/logo-twitter_normal.png Red:183 Green:212 Blue:212 and an index of similarity of 56
Red:117 Green:028 Blue:028 and an index of dissimilarity 530 Red:218 Green:221 Blue:221 and an index of dissimilarity 13 Matched Correctly. Red:061 Green:063 Blue:063 and an index of dissimilarity 491May not even be doable better with better results than what I'm already getting and I'm wasting my time here but since there seems to be a lot of experienced php programmers I guess you can point me in the right directions on how to improve this.
I'm open to other image libraries such as iMagick, Gmagick or Cairo for php but I'd prefer to avoid using other languages than php.
Thanks in advance.
I'd have thought your approach seems reasonable, but reducing an entire image to 1x1 pixel in size is probably a step too far.
However, if you converted each image to the same size and then computed the average colour in each 16x16 (or 32x32, 64x64, etc. depending on how much processing time/power you wish to use) cell you should be able to form some kind of sensible(-ish) comparison.
I would suggest, like middaparka, that you do not downsample to a 1 pixel only image, because you loose all the spatial information. Downsampling to 16x16 (or 32x32, etc.) would certainly provide better results.
Then it also depends on whether color information is important or not to you. From what I understand you could actually do without it and compute a gray-level image starting from your color image (e.g. luma) and compute the cross-correlation. If, like you said, there is a couple of images that matches exactly (except for color information) this should give you a pretty good reliability.
I used the ideas of scaling, downsampling and gray-level mentioned in the question and answers, to apply a Mean Squared Error between the pixels channels values for 2 images, using GD Library.
The code is in this answer, including a test with those ideas.
Also I did some benckmarking and I think the downsampling could be not needed in those little images, cause the method is fast (being PHP), just a fraction of a second.
Using middparka's methods, you can transform each image into a sequence of numeric values and then use the Levenshtein algorithm to find the closest match.
精彩评论