Questions about EXIF in hexadecimal form
I am trying to understand the EXIF header portion of a jpeg file (in hex) and how to understand it so I can extract data, specifically GPS information. For better or worse, I am using VB.Net 2008 (sorry, it is what I can grasp right now). I have extracted the first 64K of a jpg to a by开发者_运维问答te array and have a vague idea of how the data is arranged. Using the EXIF specification documents, version 2.2 and 2.3, I see that there are tags, that are supposed to correspond to actual byte sequences in the file. I see that there is a “GPS IFD” that has a value of 8825 (in hex). I search for the hex string 8825 in the file (which I understand to be two bytes 88 and 25) and then I believe that there is a sequence of bytes following the 8825. I suspect that those subsequent bytes denote where in the file, by way of an offset, the GPS data would be located. For example, I have the following hex bytes, starting with 88 25: 88 25 00 04 00 00 00 01 00 00 05 9A 00 00 07 14. Is the string that I am looking for longer than 16 bytes? I get the impression that in this string of data, it should be telling me where to find the actual GPS data in the file.
Looking at http://search.cpan.org/~bettelli/Image-MetaData-JPEG-0.153/lib/Image/MetaData/JPEG/Structures.pod#Exif_and_DCT, halfway down the page, it talks about “Each IFD block is a structured sequence of records, called, in the Exif jargon, Interoperability arrays. The beginning of the 0th IFD is given by the 'IFD0_Pointer' value. The structure of an IFD is the following:”
So, what is an IFD0_Pointer? Does it have to do with an offset? I presume an offset is so many bytes from a beginning point. If that is true, where is that beginning point?
Thanks for any responses.
Dale
I suggest you to read The Exif Specification (PDF); it is clear and quite easy to follow. For a short primer, here is the summary of an article I wrote:
A JPEG/Exif file starts with the start of the image marker (SOI). The SOI consists of two magic bytes 0xFF 0xD8
, identifying the file as a JPEG file. Following the SOI, there are a number of Application Marker sections (APP0, APP1, APP2, APP3, ...) typically including metadata.
Application Marker Sections
Each APPn section starts with a marker. For the APP0 section, the marker is 0xFF 0xE0
, for the APP1 section 0xFF 0xE1
, and so on. Marker bytes are followed by two bytes for the size of the section (excluding the marker, including the size bytes). The length field is followed by variable size application data. APPn sections are sequential, so that you can skip entire sections (by using the section size) until you reach the one you are interested in. Contents of APPn sections vary, the following is for the Exif APP1 section only.
The Exif APP1 Section
Exif metadata is stored in an APP1 section (there may be more than one APP1 section). The application data in an Exif APP1 section consists of the Exif marker 0x45 0x78 0x69 0x66 0x00 0x00
("Exif\0\0"
), the TIFF header and a number of Image File Directory (IFD) sections.
The TIFF Header
The TIFF header contains information about the byte-order of IFD sections and a pointer to the 0th IFD. The first two bytes are 0x49 0x49
(II
for Intel) if the byte-order is little-endian or 0x4D 0x4D
(MM
for Motorola) for big-endian. The following two bytes are magic bytes 0x00 0x2A
(42
;)). And the following four important bytes will tell you the offset to the 0th IFD from the start of the TIFF header.
Important: The JPEG file itself (what you have been reading until now) will always be in big-endian format. However, the byte-order of IFD subsections may be different, and need to be converted (you know the byte-order from the TIFF header above).
Image File Directories
Once you get this far, you have your pointer to the 0th IFD section and you are ready to read actual metadata. The remaining IFDs are referenced in different places. The offset to the Exif IFD and the GPS IFD are given in the 0th IFD fields. The offset to the first IFD is given after the 0th IFD fields. The offset to the Interoperability IFD is given in the Exif IFD.
IFDs are simply sequential records of metadata fields. The field count is given in the first two bytes of the IFD. Following the field count are 12-byte fields. Following the fields, there is a 4 byte offset from the start of the TIFF header to the start of the first IFD. This value is meaningful for only the 0th IFD. Following this, there is the IFD data section.
IFD Fields
Fields are 12-byte subsections of IFD sections. The first two-bytes of each field give the tag ID as defined in the Exif standard. The next two bytes give the type of the field data. You will have 1
for byte
, 2
for ascii
, 3
for short
(uint16
), 4
for long
(uint32
), etc. Check the Exif Specification for the complete list.
The following four bytes may be a little confusing. For byte arrays (ascii
and undefined types
), the byte length of the array is given. For example, for the Ascii string: "Exif"
, the count will be 5 including the null terminator. For other types, this is the number of field components (eg. 4 shorts, 3 rationals).
Following the count, we have the 4-byte field value. However, if the length of the field data exceeds 4 bytes, it will be stored in the IFD Data section instead. In this case, this value will be the offset from the start of the TIFF header to the start of the field data. For example, for a long
(uint32
, 4 bytes), this will be the field value. For a rational
(2 x uint32
, 8 bytes), this will be an offset to the 8-byte field data.
This is basically how metadata is arranged in a JPEG/Exif file. There are a few caveats to keep in mind (remember to convert the byte-order as needed, offsets are from the start of TIFF header, jump to data sections to read long fields, ...) but the format is quite easy to read. Following is the color-coded HEX view of a JPEG/Exif file. The blue block represents the SOI, orange is the TIFF header, green is the IFD size and offset bytes, light purple blocks are IFD fields and dark purple blocks are field data.
Here is a php script I wrote to modify exif headers.
<?php
$full_image_string=file_get_contents("torby.jpg");
$filename="torby.jpg";
if (isset($_REQUEST['filename'])){$filename=$_REQUEST['filename'];}
if (array_key_exists('file', $_REQUEST)) {
$thumb_image = exif_thumbnail($_REQUEST['file'], $width, $height, $type);
} else {
$thumb_image = exif_thumbnail($filename, $width, $height, $type);
}
if ($thumb_image!==false) {
echo $thumb_image;
$thumblen=strlen($thumb_image);
echo substr_count($full_image_string,$thumb_image);
$filler=str_pad("%%%THUMB%%%", $thumblen);
$full_image_string=str_replace($thumb_image,$filler,$full_image_string);
file_put_contents("torby.jpg",$full_image_string);
exit;
} else {
// no thumbnail available, handle the error here
echo 'No thumbnail available';
}
?>
精彩评论