This is a question, out of curiosity, about some patterns I see in JPG files when I look at them in a hex editor. I guess it is a question about the JPEG file format; why not this part is “random noise” like the rest, when it is supposed to be (Huffman coding and so on).
This 136-bit (17 bytes) pattern is showing up in some JPG files that are produced by Adobe Photoshop (I do not know if Photoshop is the only application that produces these):
F7 5E EB DE FD D7 BA F7 BF 75 EE BD EF DD 7B AF 7B
It is several places in one single file, sometimes it is just one iteration, other times it is repeated like 8 or 12 times, making up blocks of 1088 bits or 1632 bits blocks.
Or to be precise, it is actually a 68-bit pattern, repeated 2 or more times:
F7 5E EB DE FD D7 BA F7 B 11110111010111101110101111011110111111011101011110111010111101111011
AFAIK from reading a bit about the JPG file structure, and also verifying this in hex, that the beginning of JPG file structures are marked with FF xx. There are no such FF xx structure markers neither immediately before nor after those 68-bit patterns.
By using Breakpoint Hex Workshop, it is very easy to spot those patterns in the “Data Visualizer” window; while the rest of Huffman bitstream looks like “noise”, there are suddenly blocks showing clear patterns.
Also.. I am not sure how relevant this is, but..:
Earlier, I noticed such a type of patterns also in CR2 files, that is Canon RAW files;
here the pattern was a much simpler 40-bit one, though:
73 9C E7 39 CE 0111 0011 1001 1100 1110 0111 0011 1001 1100 1110
If I adjust the spaces, it becomes this:
01110 01110 01110 01110 01110 01110 01110 01110
As you can see, this is actually a repeating 5-bit pattern,
and it was repeated like several hundred times for each place it appeared in the CR2 files.
The CR2 file format is also a compressed file, but lossless. Then again, the Huffman coding in JPG is also a kind of lossless “compression” if I have understood it correctly.
I find it very strange that in compressed streams, there are these patterns of (what to me seems to be) “wasted” bits..
I have uploaded one of the JPG files here http://i.imgur.com/t0mi7vo.jpg
– it’s just a simple screenshot of some files in a folder.
The Huffman code bitstream goes from offset 0x0000027C to the end, and you may see one of the instances of the repeating pattern e.g. at offset 0x0001604A