What is the smallest valid jpeg file size (in bytes)
I'd like to screen some jpegs for validity before I send them across the network 开发者_开发技巧for more extensive inspection. It is easy enough to check for a valid header and footer, but what is the smallest size (in bytes) a valid jpeg could be?
A 1x1 grey pixel in 125 bytes using arithmetic coding, still in the JPEG standard even if most decoders can't decode it:
ff d8 : SOI
ff e0 ; APP0
00 10
4a 46 49 46 00 01 01 01 00 48 00 48 00 00
ff db ; DQT
00 43
00
03 02 02 02 02 02 03 02
02 02 03 03 03 03 04 06
04 04 04 04 04 08 06 06
05 06 09 08 0a 0a 09 08
09 09 0a 0c 0f 0c 0a 0b
0e 0b 09 09 0d 11 0d 0e
0f 10 10 11 10 0a 0c 12
13 12 10 13 0f 10 10 10
ff c9 ; SOF
00 0b
08 00 01 00 01 01 01 11 00
ff cc ; DAC
00 06 00 10 10 05
ff da ; SOS
00 08
01 01 00 00 3f 00 d2 cf 20
ff d9 ; EOI
I don't think the mentioned 134 byte example is standard, as it is missing an EOI. All decoders will handle this but the standard says it should end with one.
That file can be generated with:
#!/usr/bin/env bash
printf '\xff\xd8' # SOI
printf '\xff\xe0' # APP0
printf '\x00\x10'
printf '\x4a\x46\x49\x46\x00\x01\x01\x01\x00\x48\x00\x48\x00\x00'
printf '\xff\xdb' # DQT
printf '\x00\x43'
printf '\x00'
printf '\x03\x02\x02\x02\x02\x02\x03\x02'
printf '\x02\x02\x03\x03\x03\x03\x04\x06'
printf '\x04\x04\x04\x04\x04\x08\x06\x06'
printf '\x05\x06\x09\x08\x0a\x0a\x09\x08'
printf '\x09\x09\x0a\x0c\x0f\x0c\x0a\x0b'
printf '\x0e\x0b\x09\x09\x0d\x11\x0d\x0e'
printf '\x0f\x10\x10\x11\x10\x0a\x0c\x12'
printf '\x13\x12\x10\x13\x0f\x10\x10\x10'
printf '\xff\xc9' # SOF
printf '\x00\x0b'
printf '\x08\x00\x01\x00\x01\x01\x01\x11\x00'
printf '\xff\xcc' # DAC
printf '\x00\x06\x00\x10\x10\x05'
printf '\xff\xda' # SOS
printf '\x00\x08'
printf '\x01\x01\x00\x00\x3f\x00\xd2\xcf\x20'
printf '\xff\xd9' # EOI
and opened fine with GNOME Image Viewer 3.38.0 and GIMP 2.10.18 on Ubuntu 20.10.
Here's an upload on Imgur. Note that Imgur process the file making it larger however if you download it to check, and as seen below, the width=100
image shows white on Chromium 87:
It occurs to me you could make a progressive jpeg with only the DC coefficients, that a single grey pixel could be encoded in 119 bytes. This reads just fine in a few programs I've tried it in (Photoshop, GNOME Image Viewer 3.38.0, GIMP 2.10.18, and others).
ff d8 : SOI
ff db ; DQT
00 43
00
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
ff c2 ; SOF
00 0b
08 00 01 00 01 01 01 11 00
ff c4 ; DHT
00 14
00
01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
03
ff da ; SOS
00 08
01 01 00 00 00 01 3F
ff d9 ; EOI
The main space savings is to only have one Huffman table. Although this is slightly smaller than the 125 byte arithmetic encoding given in another answer, the arithmetic encoding without the JFIF header would be smaller yet (107 bytes), so that should still be considered the smallest known.
The above file can be generated with:
#!/usr/bin/env bash
printf '\xff\xd8' # SOI
printf '\xff\xdb' # DQT
printf '\x00\x43'
printf '\x00'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\xff\xc2' # SOF
printf '\x00\x0b'
printf '\x08\x00\x01\x00\x01\x01\x01\x11\x00'
printf '\xff\xc4' # DHT
printf '\x00\x14'
printf '\x00'
printf '\x01\x00\x00\x00\x00\x00\x00\x00'
printf '\x00\x00\x00\x00\x00\x00\x00\x00'
printf '\x03'
printf '\xff\xda' # SOS
printf '\x00\x08'
printf '\x01\x01\x00\x00\x00\x01\x3F'
printf '\xff\xd9' # EOI
Try the following (134 bytes):
FF D8 FF E0 00 10 4A 46 49 46 00 01 01 01 00 48 00 48 00 00
FF DB 00 43 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF C2 00 0B 08 00 01 00 01 01 01
11 00 FF C4 00 14 10 01 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 FF DA 00 08 01 01 00 01 3F 10
Source: Worlds Smallest, Valid JPEG? by Jesse_hz
Found "the tiniest GIF ever" with only 26 bytes.
47 49 46 38 39 61 01 00 01 00
00 ff 00 2c 00 00 00 00 01 00
01 00 00 02 00 3b
Python literal:
b'GIF89a\x01\x00\x01\x00\x00\xff\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x00;'
While I realize this is far from the smallest valid jpeg and has little or nothing to do with your actual question, I felt I should share this as I'd been looking for a very small JPEG that actually looked like something to do some testing with when i'd found your question. I'm sharing it here because its valid, its small, and it makes me ROFL.
Here is a 384 byte JPEG image that I made in photoshop. It is the letters ROFL hand drawn by me and then saved with max compression settings while still being sort of readable.
Hex sequences:
my @image_hex = qw{
FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 64
00 64 00 00 FF EC 00 11 44 75 63 6B 79 00 01 00
04 00 00 00 00 00 00 FF EE 00 0E 41 64 6F 62 65
00 64 C0 00 00 00 01 FF DB 00 84 00 1B 1A 1A 29
1D 29 41 26 26 41 42 2F 2F 2F 42 47 3F 3E 3E 3F
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 01 1D 29 29
34 26 34 3F 28 28 3F 47 3F 35 3F 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 FF C0 00
11 08 00 08 00 19 03 01 22 00 02 11 01 03 11 01
FF C4 00 61 00 01 01 01 01 00 00 00 00 00 00 00
00 00 00 00 00 00 04 02 05 01 01 01 01 00 00 00
00 00 00 00 00 00 00 00 00 00 00 02 04 10 00 02
02 02 02 03 01 00 00 00 00 00 00 00 00 00 01 02
11 03 00 41 21 12 F0 13 04 31 11 00 01 04 03 00
00 00 00 00 00 00 00 00 00 00 00 00 21 31 61 71
B1 12 22 FF DA 00 0C 03 01 00 02 11 03 11 00 3F
00 A1 7E 6B AD 4E B6 4B 30 EA E0 19 82 39 91 3A
6E 63 5F 99 8A 68 B6 E3 EA 70 08 A8 00 55 98 EE
48 22 37 1C 63 19 AF A5 68 B8 05 24 9A 7E 99 F5
B3 22 20 55 EA 27 CD 8C EB 4E 31 91 9D 41 FF D9
}; #this is a very tiny jpeg. it is a image representaion of the letters "ROFL" hand drawn by me in photoshop and then saved at the lowest possible quality settings where the letters could still be made out :)
my $image_data = pack('H2' x scalar(@image_hex), @image_hex);
my $url_escaped_image = uri_escape( $image_data );
URL escaped binary image data (can paste right into a URL)
%FF%D8%FF%E0%00%10JFIF%00%01%02%00%00d%00d%00%00%FF%EC%00%11Ducky%00%01%00%04%00%00%00%00%00%00%FF%EE%00%0EAdobe%00d%C0%00%00%00%01%FF%DB%00%84%00%1B%1A%1A)%1D)A%26%26AB%2F%2F%2FBG%3F%3E%3E%3FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG%01%1D))4%264%3F((%3FG%3F5%3FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG%FF%C0%00%11%08%00%08%00%19%03%01%22%00%02%11%01%03%11%01%FF%C4%00a%00%01%01%01%01%00%00%00%00%00%00%00%00%00%00%00%00%00%04%02%05%01%01%01%01%00%00%00%00%00%00%00%00%00%00%00%00%00%00%02%04%10%00%02%02%02%02%03%01%00%00%00%00%00%00%00%00%00%01%02%11%03%00A!%12%F0%13%041%11%00%01%04%03%00%00%00%00%00%00%00%00%00%00%00%00%00!1aq%B1%12%22%FF%DA%00%0C%03%01%00%02%11%03%11%00%3F%00%A1~k%ADN%B6K0%EA%E0%19%829%91%3Anc_%99%8Ah%B6%E3%EAp%08%A8%00U%98%EEH%227%1Cc%19%AF%A5h%B8%05%24%9A~%99%F5%B3%22%20U%EA'%CD%8C%EBN1%91%9DA%FF%D9
Here's the C++ routine I wrote to do this:
bool is_jpeg(const unsigned char* img_data, size_t size)
{
return img_data &&
(size >= 10) &&
(img_data[0] == 0xFF) &&
(img_data[1] == 0xD8) &&
((memcmp(img_data + 6, "JFIF", 4) == 0) ||
(memcmp(img_data + 6, "Exif", 4) == 0));
}
img_data
points to a buffer containing the JPEG data.
I'm sure you need more bytes to have a JPEG that will decode to a useful image, but it's a fair bet that if the first 10 bytes pass this test, the buffer probably contains a JPEG.
EDIT: You can, of course, replace the 10 above with a higher value once you decide on one. 134, as suggested in another answer, for example.
It is not a requirement that JPEGs contain either a JFIF or Exif marker. But they must start with FF D8, and they must have a marker following that, so you can check for FF D8 FF.
精彩评论