开发者

How to check if file is/isn't an image without loading full file? Is there an image header-reading library?

edit:

Sorry, I guess my question was vague. I'd like to have a way to check if a file is not an image without wasting time loading the whole image, because then I can do the rest of the loading later. I don't want to just check the file extension.

The application just views the images. By 'checking the validity', I meant 'detecting and skipping the non-image files' also in the directory. If the pixel data is corrupt, I'd like to still treat it as an image.

I assign page numbers and pair up these images. Some images are the single left or right page. Some images are wide and are the "spread" of the left and right pages. For example, pagesAt(3) and pagesAt(4) could return the same std::pair of images or a std::pair of the same wide image.

Sometimes, there is an odd number of 'thin' images, and the first image is to be displayed on its own, similar to a wide image. An example would be a single cover page.

Not knowing which files in the directory are non-images means I can't confidently assign those page numbers and pair up the files for displaying. Also, the user may decide to jump to page X, and when I later discover and remove a non-image file and reassign page numbers accordingly, page X could appear to be a different image.

original:

In case it matters, I'm using c++ and QImage from the Qt library.

I'm iterating through a directory and using the QImage constructor on the paths to the images. This is, of course, pretty slow and makes the application feel unresponsive. However, it does allow me to detect invalid image files and ignore them early on.

I could just sa开发者_JAVA百科ve only the paths to the images while going through the directory and actually load them only when they're needed, but then I wouldn't know if the image is invalid or not.

I'm considering doing a combination of these two. i.e. While iterating through the directory, reading only the headers of the images to check validity and then load image data when needed.

So,

Will just loading the image headers be much faster than loading the whole image? Or is doing a bit of i/o to read the header mean I might as well finish off loading image in full? Later on, I'll be uncompressing images from archives as well, so this also applies to uncompressing just the header vs uncompressing the whole file.

Also, I don't know how to load/read just the image headers. Is there a library that can read just the headers of images? Otherwise, I'd have to open each file as a stream and code image header readers for all the filetypes on my own.


The Unix file tool (which has been around since almost forever) does exactly this. It is a simple tool that uses a database of known file headers and binary signatures to identify the type of the file (and potentially extract some simple information).

The database is a simple text file (which gets compiled for efficiency) that describes a plethora of binary file formats, using a simple structured format (documented in man magic). The source is in /usr/share/file/magic (in Ubuntu). For example, the entry for the PNG file format looks like this:

0       string          \x89PNG\x0d\x0a\x1a\x0a         PNG image
!:mime  image/png
>16     belong          x               \b, %ld x
>20     belong          x               %ld,
>24     byte            x               %d-bit
>25     byte            0               grayscale,
>25     byte            2               \b/color RGB,
>25     byte            3               colormap,
>25     byte            4               gray+alpha,
>25     byte            6               \b/color RGBA,
>28     byte            0               non-interlaced
>28     byte            1               interlaced

You could extract the signatures for just the image file types, and build your own "sniffer", or even use the parser from the file tool (which seems to be BSD-licensed).


Just to add my 2 cents: you can use QImageReader to get information about image files without actually loading the files.

For example with the .format method you can check a file's image format.

From the official Qt doc ( http://qt-project.org/doc/qt-4.8/qimagereader.html#format ):

Returns the format QImageReader uses for reading images. You can call this function after assigning a device to the reader to determine the format of the device. For example: QImageReader reader("image.png"); // reader.format() == "png" If the reader cannot read any image from the device (e.g., there is no image there, or the image has already been read), or if the format is unsupported, this function returns an empty QByteArray().


I don't know the answer about just loading the header, and it likely depends on the image type that you are trying to load. You might consider using Qt::Concurrent to go through the images while allowing the rest of the program to continue, if it's possible. In this case, you would probably initially represent all of the entries as an unknown state, and then change to image or not-an-image when the verification is done.


If you're talking about image files in general, and not just a specific format, I'd be willing to bet there are cases where the image header is valid, but the image data isn't. You haven't said anything about your application, is there no way you could add in a thread in the background that could maybe keep a few images in ram, and swap them in and out depending on what the user may load next? IE: a slide show app would load 1 or 2 images ahead and behind the current one. Or maybe have a question mark displayed next to the image name until the background thread can verify that validity of the data.


While opening and reading the header of a file on a local filesystem should not be too expensive, it can be expensive if the file is on a remote (networked) file system. Even worse, if you are accessing files saved with hierarchical storage management, reading the file can be very expensive.

If this app is just for you, then you can decide not to worry about those issues. But if you are distributing your app to the public, reading the file before you absolutely have to will cause problems for some users.

Raymond Chen wrote an article about this for his blog The Old New Thing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜