开发者

How to verify the integrity of a image file in c++ or python?

I want to check whether the images is开发者_StackOverflow downloaded completely. Is there any library to use? The images I want to verify including various formats such jpeg, png, bmp etc.


The standard go-to library for that kind of thing in Python is the Python Imaging Library (PIL).


I have used Pyhton Pillow module (PIL) and Imagemagick wrapper wand (for psd, xcf formats) in order to detect broken images, the original answer with code snippets is here.

I also implemented this solution in my Python script here on GitHub.

I also verified that damaged files (jpg) frequently are not 'broken' images i.e, a damaged picture file sometimes remains a legit picture file, the original image is lost or altered but you are still able to load it.

I quote the full answer for completeness:

You can use Python Pillow(PIL) module, with most image formats, to check if a file is a valid and intact image file.

In the case you aim at detecting also broken images, @Nadia Alramli correctly suggests the im.verify() method, but this does not detect all the possible image defects, e.g., im.verify does not detect truncated images (that most viewers often load with a greyed area).

Pillow is able to detect these type of defects too, but you have to apply image manipulation or image decode/recode in or to trigger the check. Finally I suggest to use this code:

try:
  im = Image.load(filename)
  im.verify() #I perform also verify, don't know if he sees other types o defects
  im.close() #reload is necessary in my case
  im = Image.load(filename) 
  im.transpose(PIL.Image.FLIP_LEFT_RIGHT)
  im.close()
except: 
  #manage excetions here

In case of image defects this code will raise an exception. Please consider that im.verify is about 100 times faster than performing the image manipulation (and I think that flip is one of the cheaper transformations). With this code you are going to verify a set of images at about 10 MBytes/sec (modern 2.5Ghz x86_64 CPU).

For the other formats psd,xcf,.. you can use Imagemagick wrapper Wand, the code is as follows:

im = wand.image.Image(filename=filename)
temp = im.flip;
im.close()

But, from my experiments Wand does not detect truncated images, I think it loads lacking parts as greyed area without prompting.

I red that Imagemagick has an external command identify that could make the job, but I have not found a way to invoke that function programmatically and I have not tested this route.

I suggest to always perform a preliminary check, check the filesize to not be zero (or very small), is a very cheap idea:

statfile = os.stat(filename)
filesize = statfile.st_size
if filesize == 0:
  #manage here the 'faulty image' case


You can guess by attempting to load the image into memory (using PIL or somesuch), but it's possible that some images could be loaded ok without being complete - for example an animated GIF might load fine if you have the header and the first frame of the animation, and you won't notice that later frames of the animation were missing.

A more reliable approach would probably be to use some out-of-band communication, like rather than watching a folder and processing new files as soon as they exist, find some way of hooking into the downloader process and getting it to give you a signal when it decides it is ready.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜