JPEG file format
In this tutorial we will be going over the JPEG file format.
I am planning on making this into a series:
- Understanding the JPEG file format header
- Decoding a JPEG file format header in C, Go and Rust
- Writing a JPEG file in C, Go and Rust
This is going to be challenging, so stick with me. I am still getting the hang of it, but I have a pretty good understanding of the header. So, without further ado, lets get into it.
The JEPG file format is quite a hefty format to consider working with. Not only is it hefty, but it differs from image to image.
But this is not the worry in this tutorial. The worry is understanding the header format of the JPEG file.
So, lets see this!
The first two bytes of the header primarily just say, "Hey, the image is starting!".
The first two bytes are 0xFF 0xD8 respectively.
Lets add this to the stream:
Note: Throughout this tutorial, I will be referencing JPEG images that support the format I am going over. I will not be going over any odd formats. They're a pain.
The next two bytes represent what is called the "marker id". This tells us to be ready to read the 5 byte header primarily saying, "this is a JPEG file".
These two bytes are 0xFF 0xE0 respectively.
Note: JPEG file formats support what is called "Flags". "Flags" are defined with a 0xFF. So, anytime you see a 0xFF, you'll most likely be reading a flag. I will go over flags later on in this tutorial.
Lets add those two bytes to the stream:
0xFF 0xD8 0xFF 0xE0
Next, we get the "header length". Now, this whole thing is the header of the file, but the "header" within the header primarily just gives information to the JPEG image.
The next two bytes tells us the length of the header, with a built-in of 5 bytes(you'll see why later).
So, by default, the header within the JPEG header has a built-in of 5 bytes. I am going to reference a image here in this tutorial that has a header of 16 bytes. 11 bytes of information after the 5 bytes assigned to reference "JFIF".
"JFIF" just tells the file, "Hey, we're a JPEG image".
I'm getting too ahead of myself. Lets take a look at what these two bytes will be if we have a header length of 16:
Lets add this:
0xFF 0xD8 0xFF 0xE0 0x00 0x10
Now, we have the first two bytes that tell us that the image is starting. We have the marker id that tells us the header is starting. Then we have two bytes that gives us the length of the header.
Sweet! Onto the next step.
Remember I said there are a built-in of 5 bytes? Those five bytes go to "JFIF". But wait, that's just 4 characters Moca!.
YES! It is. There is a byte of padding after "JFIF". So, really, there is a built in of 6 bytes. 6 bytes for "JFIF", and the rest are used to give information to the header.
What does "JFIF" have to do with anything?
Well, when I was studying the file format of a JPEG image, I ran into "JFIF" in almost every image. All it does is define the image as a "JPEG". It is also known as the "JFIF app segment".
Besides the fact, the bytes are as follows:
0x4a 0x46 0x49 0x49 0x00 -> "JFIF "
Lets add this:
0xFF 0xD8 0xFF 0xE0 0x00 0x10 0x4a 0x46 0x49 0x49 0x00
Now, lets cover the remaining 10 bytes. Don't worry what these values are. They are just information needed for the image. Lets add it!
0xFF 0xD8 0xFF 0xE0 0x00 0x10 0x4a 0x46 0x49 0x49 0x00 01 0100 0001 0001 0000
Sweet! But wait Moca. There is still one byte!
Yes, there is. That byte leads into a flag definition. Which is where the next part of this tutorial is headed! But, while we're here, lets add this flag:
Note: I am referencing a JPEG image as I am doing this tutorial
0xFF 0xD8 0xFF 0xE0 0x00 0x10 0x4a 0x46 0x49 0x49 0x00 01 0100 0001 0001 0000 0xFF 0xDB
In a JPEG image, there are two special tables. These tables are called a Quantization Table and Huffman Table.
These tables give vivid information about the image.
The table definitions take in 4 bytes. 2 byes for the flag definition, 2 bytes for the length of the table.
But first, lets see what "flag" defined what "table":
- 0xDB defines a Quantization Table
- 0xC4 defines a Huffman Table
Alright, now that that's out of the way. Lets implement the two bytes for the length of this table:
0xFF 0xD8 0xFF 0xE0 0x00 0x10 0x4a 0x46 0x49 0x49 0x00 01 0100 0001 0001 0000 0xFF 0xDB 0x00 0x84
This table consists of 132 bytes. Wow. Quite a large table. If I do remember correctly, the image I am referencing is quite a decent image. So that would make most sense.
The same goes for a Huffman table. Since the first table in the image I am referencing is a Quantization Table.
A Huffman table is defined using 0xFF 0xC4, then the two bytes after it define the tables length.
Normally, the first table you will see in a JPEG file is a Quantization Table. In fact, that is the first able you will see. A Quantization Table will always be the first table you will run into inside a JPEG image.
Now, this is all for this tutorial, but before I go I am going to leave you with some information for the next tutorial when we dive a bit deeper:
- 0xFF 0xC0 defines the start of frame
- 0xFF 0xC1 defines the start of frame
- 0xFF 0xDA defines the start of scan(pixel array)
- 0xFF 0xD9 defines the end of the image.
We will be putting the above information to use in the next tutorial when we dive a bit deeper into a JPEG file format and see how the tables work, and how the flags works within the file.
Until then, MocaCDeveloper logging out :)
I remember hearing somewhere that JPEG (de)compression involves performing a Fourier transform, so I'll be interested to see where this goes :)
Cool! (I've always been too scared to work with the jpeg format; I heard somewhere that it's even harder than png)
It is really stupid. According to the documentation over the JPEG file format, it's supposed to have the marker ID after 0xFF 0xD8, which every file should have if it's a JPEG since that's what starts the image array primarily.
But nope. Some files totally ignore the marker ID, as well as the JFIF which tells the computer, "Hey I'm a JPEG image".
It's probably the most confusing format to ever exist. I mean it has to be lol. But the format did come out in 1992 so I am sure there are multiple multiple newer versions which is why the format changes from image to image.
I just use whatever is needed lol. Especially with Rust since there are more built-ins than there are C, you can't really stay away from it. But it's also low level so I enjoy it. Being able to use built-ins without the worry of slow runtimes(lets act like Rust doesn't suck at compiling)
Agreed. Rust is extremely slow, sadly. But hey, it's a powerful language for being low-level. Can't complain.
Also, I found this documentation over PNG images. It has some pretty good explanations.
If you're to where you can access a linux terminal, type in:
xxd -g -1 img_name.png to see the raw data of the image