Getting Image size of JPEG from its binary

2022-12-24 12:59 问答作者：

I have a lot of jpeg files with varying image size. For instance, here is the first 640 bytes as given by hexdump of an image of size 256*384(pixels):

0000000: ffd8 ffe0 0010 4a46 4946 0001 0101 0048  ......JFIF.....H
0000010: 0048 0000 ffdb 0043 0003 0202 0302 0203  .H.....C........
0000020: 0303 0304 0303 0405 0805 0504 0405 0a07  ................
0000030: 0706 080c 0a0c 0c0b 0a0b 0b0d 0e12 100d  ................

I guess the size information mus be within these lines. But am unable to see which by开发者_C百科tes give the sizes correctly. Can anyone help me find the fields that contains the size information?

According to the Syntax and structure section of the JPEG page on wikipedia, the width and height of the image don't seem to be stored in the image itself -- or, at least, not in a way that's quite easy to find.

Still, quoting from JPEG image compression FAQ, part 1/2 :

Subject: [22] How can my program extract image dimensions from a JPEG file?

The header of a JPEG file consists of a series of blocks, called "markers". The image height and width are stored in a marker of type SOFn (Start Of Frame, type N).
To find the SOFn you must skip over the preceding markers; you don't have to know what's in the other types of markers, just use their length words to skip over them.
The minimum logic needed is perhaps a page of C code.
(Some people have recommended just searching for the byte pair representing SOFn, without paying attention to the marker block structure. This is unsafe because a prior marker might contain the SOFn pattern, either by chance or because it contains a JPEG-compressed thumbnail image. If you don't follow the marker structure you will retrieve the thumbnail's size instead of the main image size.)
A profusely commented example in C can be found in rdjpgcom.c in the IJG distribution (see part 2, item 15).
Perl code can be found in wwwis, from http://www.tardis.ed.ac.uk/~ark/wwwis/.

(Ergh, that link seems broken...)

Here's a portion of C code that could help you, though : Decoding the width and height of a JPEG (JFIF) file

This function will read JPEG properties

function jpegProps(data) {          // data is an array of bytes
    var off = 0;
    while(off<data.length) {
        while(data[off]==0xff) off++;
        var mrkr = data[off];  off++;
        
        if(mrkr==0xd8) continue;    // SOI
        if(mrkr==0xd9) break;       // EOI
        if(0xd0<=mrkr && mrkr<=0xd7) continue;
        if(mrkr==0x01) continue;    // TEM
        
        var len = (data[off]<<8) | data[off+1];  off+=2;  
        
        if(mrkr==0xc0) return {
            bpc : data[off],     // precission (bits per channel)
            h   : (data[off+1]<<8) | data[off+2],
            w   : (data[off+3]<<8) | data[off+4],
            cps : data[off+5]    // number of color components
        }
        off+=len-2;
    }
}

I have converted the CPP code from the top answer into a python script.

"""
Source: https://stackoverflow.com/questions/2517854/getting-image-size-of-jpeg-from-its-binary#:~:text=The%20header%20of%20a%20JPEG,Of%20Frame%2C%20type%20N).
"""
def get_jpeg_size(data):
   """
   Gets the JPEG size from the array of data passed to the function, file reference: http:#www.obrador.com/essentialjpeg/headerinfo.htm
   """
   data_size=len(data)
   #Check for valid JPEG image
   i=0   # Keeps track of the position within the file
   if(data[i] == 0xFF and data[i+1] == 0xD8 and data[i+2] == 0xFF and data[i+3] == 0xE0): 
   # Check for valid JPEG header (null terminated JFIF)
      i += 4
      if(data[i+2] == ord('J') and data[i+3] == ord('F') and data[i+4] == ord('I') and data[i+5] == ord('F') and data[i+6] == 0x00):
         #Retrieve the block length of the first block since the first block will not contain the size of file
         block_length = data[i] * 256 + data[i+1]
         while (i<data_size):
            i+=block_length               #Increase the file index to get to the next block
            if(i >= data_size): return False;   #Check to protect against segmentation faults
            if(data[i] != 0xFF): return False;   #Check that we are truly at the start of another block
            if(data[i+1] == 0xC0):          #0xFFC0 is the "Start of frame" marker which contains the file size
               #The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort length][uchar precision][ushort x][ushort y]
               height = data[i+5]*256 + data[i+6];
               width = data[i+7]*256 + data[i+8];
               return height, width
            else:
               i+=2;                              #Skip the block marker
               block_length = data[i] * 256 + data[i+1]   #Go to the next block
         return False                   #If this point is reached then no size was found
      else:
         return False                  #Not a valid JFIF string
   else:
      return False                     #Not a valid SOI header




with open('path/to/file.jpg','rb') as handle:
   data = handle.read()

h, w = get_jpeg_size(data)
print(s)

This is how I implemented this using js. The marker you are looking for is the Sofn marker and the pseudocode would basically be:

start from the first byte
the beginning of a segment will always be FF followed by another byte indicating marker type (those 2 bytes are called the marker)
if that other byte is 01 or D1 through D9, there is no data in that segment, so proceed to next segment
if that marker is C0 or C2 (or any other Cn, more detail in the comments of the code), thats the Sofn marker you're looking for
- the following bytes after the marker will be P (1 byte), L (2 bytes), Height (2 bytes), Width (2 bytes) respectively
otherwise, the next two bytes followed by it will be the length property (length of entire segment excluding the marker, 2 bytes), use that to skip to the next segment
repeat until you find the Sofn marker

function getJpgSize(hexArr) {
  let i = 0;
  let marker = '';

  while (i < hexArr.length) {
    //ff always start a marker,
    //something's really wrong if the first btye isn't ff
    if (hexArr[i] !== 'ff') {
      console.log(i);
      throw new Error('aaaaaaa');
    }

    //get the second byte of the marker, which indicates the marker type
    marker = hexArr[++i];

    //these are segments that don't have any data stored in it, thus only 2 bytes
    //01 and D1 through D9
    if (marker === '01' || (!isNaN(parseInt(marker[1])) && marker[0] === 'd')) {
      i++;
      continue;
    }

    /*
    sofn marker: https://www.w3.org/Graphics/JPEG/itu-t81.pdf pg 36
      INFORMATION TECHNOLOGY –
      DIGITAL COMPRESSION AND CODING
      OF CONTINUOUS-TONE STILL IMAGES –
      REQUIREMENTS AND GUIDELINES

    basically, sofn (start of frame, type n) segment contains information
    about the characteristics of the jpg

    the marker is followed by:
      - Lf [frame header length], two bytes
      - P [sample precision], one byte
      - Y [number of lines in the src img], two bytes, which is essentially the height
      - X [number of samples per line], two bytes, which is essentially the width 
      ... [other parameters]

    sofn marker codes: https://www.digicamsoft.com/itu/itu-t81-36.html
    apparently there are other sofn markers but these two the most common ones
    */
    if (marker === 'c0' || marker === 'c2') {
      break;
    }
    //2 bytes specifying length of the segment (length excludes marker)
    //jumps to the next seg
    i += parseInt(hexArr.slice(i + 1, i + 3).join(''), 16) + 1;
  }
  const size = {
    height: parseInt(hexArr.slice(i + 4, i + 6).join(''), 16),
    width: parseInt(hexArr.slice(i + 6, i + 8).join(''), 16),
  };
  return size;
}

If you are on a linux system and have PHP at hand, variations on this php script may produce what you are looking for:

#! /usr/bin/php -q
<?php

if (file_exists($argv[1]) ) {

    $targetfile = $argv[1];

    // get info on uploaded file residing in the /var/tmp directory:
    $safefile       = escapeshellcmd($targetfile);
    $getinfo        = `/usr/bin/identify $safefile`;
    $imginfo        = preg_split("/\s+/",$getinfo);
    $ftype          = strtolower($imginfo[1]);
    $fsize          = $imginfo[2];

    switch($fsize) {
        case 0:
            print "FAILED\n";
            break;
        default:
            print $safefile.'|'.$ftype.'|'.$fsize."|\n";
    }
}

// eof

host> imageinfo 009140_DJI_0007.JPG

009140_DJI_0007.JPG|jpeg|4000x3000|

(Outputs filename, file type, file dimensions in pipe-delimited format)

From the man page:

For more information about the 'identify' command, point your browser to [...] http://www.imagemagick.org/script/identify.php.

Dart/Flutter port from a solution in this forum.

class JpegProps {
  final int precision;

  final int height;

  final int width;

  final int compression;

  JpegProps._(this.precision, this.height, this.width, this.compression,);

  String toString() => 'JpegProps($precision,$height,$width,$compression)';

  static JpegProps readImage(Uint8List imageData) {
    // data is an array of bytes
    int offset = 0;
    while (offset < imageData.length) {
      while (imageData[offset] == 0xff) offset++;
      var mrkr = imageData[offset];
      offset++;

      if (mrkr == 0xd8) continue; // SOI
      if (mrkr == 0xd9) break; // EOI
      if (0xd0 <= mrkr && mrkr <= 0xd7) continue;
      if (mrkr == 0x01) continue; // TEM

      var length = (imageData[offset] << 8) | imageData[offset + 1];
      offset += 2;

      if (mrkr == 0xc0) {
        return JpegProps._(imageData[offset],
          (imageData[offset + 1] << 8) | imageData[offset + 2],
          (imageData[offset + 3] << 8) | imageData[offset + 4],
          imageData[offset + 5],
        );
      }
      offset += length - 2;
    }
    throw '';
  }
}

Easy way to get width and heigh from a .jpg picture. Remove the EXIF and ITP information in the the file. Use "Save as" function in a view picture program (I used IrfanView or Pain Shop Pro). In the "Save as" get rid of EXIF, then save the file. The jpg file has always without EXIF the heigh at byte 000000a3 and 000000a4. The width are at 000000a5 and 000000a6

I use php

function storrelse_jpg($billedfil)  //billedfil danish for picturefile
{
    //Adresse  for jpg fil without EXIF info !!!!!
    // width is in byte 165 til 166, heigh is in byte 163 og 164
    // jpg dimensions are with 2 bytes ( in png are the dimensions with 4 bytes

    $billedfil="../diashow/billeder/christiansdal_teltplads_1_x.jpg"; // the picturefil 

    $tekst=file_get_contents($billedfil,0,NULL,165,2); //Read from 165  2 bytes  - width
    $tekst1=file_get_contents($billedfil,0,NULL,163,2);//Read from  163  2 bytes - heigh
    $n=strlen($tekst); // længden af strengen
     
    echo "St&oslash;rrelse på billed : ".$billedfil. "<br>"; // Headline 

    $bredde=0; // width  
    $langde=0; // heigh
    for ($i=0;$i<$n;$i++)
    {
        $by=bin2hex($tekst[$i]); //width-byte from binær to hex 
        $bz=hexdec($by);// then from hex to decimal
        
        $ly=bin2hex($tekst1[$i]); // the same for length byte
        $lz=hexdec($ly);
        
        
        $bredde=$bredde+$bz*256**(1-$i);
        $langde=$langde+$lz*256**(1-$i);
    }
    // $x is a array $x[0] er width and $x[1] er heigh
    $x[0]=$bredde; $x[1]=$langde;
    
    return $x;
}

A python solution based on "raw" CPP convert - https://stackoverflow.com/a/62245035/11807679

def get_jpeg_resolution(image_bytes: bytes,
                        size: int = None) -> Optional[Tuple[int, int]]:
    """
    function for getting resolution from binary
    :param image_bytes: image binary
    :param size: image_bytes len if value is None it'll calc inside
    :return: (width, height) or None if not found
    """
    size = len(image_bytes) if size is None else size

    header_bytes = (0xff, 0xD8, 0xff, 0xe0)

    if not (size > 11
            and header_bytes == struct.unpack_from('>4B', image_bytes)):
        # Incorrect header or minimal length
        return None

    jfif_bytes = tuple(ord(s) for s in 'JFIF') + (0x0, )

    if not (jfif_bytes == struct.unpack_from('5B', image_bytes, 6)):
        # Not a valid JFIF string
        return None

    index = len(header_bytes)
    block_length, = struct.unpack_from(">H", image_bytes, index)

    index += block_length

    while index < size:
        if image_bytes[index] != 0xFF:
            break
            # Check that we are truly at the start
            # of another block
        if image_bytes[index + 1] == 0xC0:
            # 0xFFC0 is the "Start of frame" marker
            # which contains the file size
            # The structure of the 0xFFC0 block is
            # quite simple
            # [0xFFC0][ushort length][uchar precision]
            #   [ushort x][ushort y]

            height, width = struct.unpack_from(">HH", image_bytes, index + 5)
            return width, height
        else:
            index += 2
            # Skip the block marker
            # Go to the next block
            block_length, = struct.unpack(">H",
                                          image_bytes[slice(index, index + 2)])
        # Increase the file index to get to the next block
        index += block_length

    # If this point is reached then no size was found
    return None

继续阅读：binary file jpeg

Getting Image size of JPEG from its binary

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？