PNG File Format(part 1)
Welcome! I decided I'd do two series at the same time. In this series we're going to learn about the PNG file format, and how to decode and write images accordingly in C, Rust and Go!
Without further ado, lets get into it!
The beginning of the PNG file.
The beginning of a PNG file is 8 bytes.
4 of the 8 bytes are 0x89 0x50 0x4e 0x47
which is .PNG
respectively.
The next 4 bytes are 0x0d 0x0a 0x1a 0x0a
which is 13 10 26 10
.
So, lets initialize the beginning of the header:
0x89 0x50 0x4e 0x47 0x0d 0x0a 0x1a 0x0a
Perfect.
I want to quickly review the chunk layout of a PNG file.
"Chunks" have the following format:
| length | chunk type | chunk data | CRC | | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | | 4 bytes | 4 bytes | 4 bytes | 4 bytes |
or it can be:
| length | chunk type | CRC | | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | | 4 bytes | 4 bytes | 4 bytes |
A CRC is a error-detecting code that is used to detect accidental changes to raw data.
There are multiple different chunks that can be used. But here are some of the most critical chunks you are likely to see(the other chunks will be informed at the end of this tutorial):
- IHDR
- This is required in ALL .png files. It is the first thing in every file. It contains:
- width - 4 bytes
- height - 4 bytes
- Bit depth - 1 byte
- Color Type - 1 byte
- Compression Method - 1 byte
- Filter Method - 1 byte
- Interlace Method - 1 byte
- The IHDR has a total of 13 bytes, respectively.
- The IHDR is defined with the stream
0x49 0x48 0x44 0x52
->73 72 68 82
- This is required in ALL .png files. It is the first thing in every file. It contains:
- PLTE
- PLTE will occur before the IDAT(if it exists).
- PLTE does not have to occur within the .png file.
- Defined by
0x50 0x4C 0x54 0x45
->80 76 84 69
- The PLTE chunk can contain 1-256 entries. Each being a 3 byte series of the below format.
- The PLTE has the following format:
| red | Green | Blue | | 00 00 00 | 00 00 00 | 00 00 00 | | 3 bytes | 3 bytes | 3 bytes |
- IDAT
- This contains the actual image data(which is compressed by default).
- Define by
0x49 0x44 0x41 0x54
->73 68 65 84
- There can be multiple IDAT chunks within a .png file.
- IEND
- Defined by
0x49 0x45 0x4E 0x44
->73 69 78 68
- This will be at the end of all .png files.
- Defined by
Great! Now that we have a basic understanding of the critical chunks within a .png file, lets see where the IHDR chunk is seen.
IHDR chunk
The IHDR chunk follows the format:
| length | chunk type | chunk data | CRC | | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | 00 00 00 00 | | 4 bytes | 4 bytes | 4 bytes | 4 bytes |
So, using the example .png file I am using, lets take a look at it!
The stream is as follows:
00 00 00 0d 49 48 44 52 00 00 00 c8 00 00 00 c8 08 06 00 00 00 ad 58 ae
Lets dereference this stream:
Remember that the IHDR consist of 13 bytes?
Well, the first 4 bytes of the stream(the length) is 00 00 00 0d
, which is the value 13.
The chunk type declaration does not take up any of the 13 bytes, so the next 4 bytes of the stream just declare IHDR in hex values.
Then, we will read the images width in height in the first 8 bytes of the 13. Lets see:
00 00 00 c8 00 00 00 c8
.
0xC8 converts to 200. So this is a 200x200 image we're working with. Onto the next step. The next 5 bytes will tell us:
- Bit depth
- Color Type
- Compression Method
- Filter Method
- Interlace Method
So, the bit depth is 08(8 bits), the color type is 06(6, true color with alpha), the compression method is 0(none), the filter method is 0(none), the interlace method is 0(none).
Then, the last 4 bytes are the CRC. You can read more on CRC here: https://en.wikipedia.org/wiki/Cyclic_redundancy_check#:~:text=A%20cyclic%20redundancy%20check%20(CRC,polynomial%20division%20of%20their%20contents
PLTE chunk
There is no PLTE chunk within the image I am using.
But the length of the chunk will be between 1 and 256. And it will render 3 bytes at a time(RGB values).
Example:
00 00 00 03 80 76 84 69 255 255 255
00 00 00 03
is the length of the chunk. Being 1. Meaning we're rendering just a single set of RGB values.
Then, we get the chunk name(PLTE). Then, we get the 3 bytes rendered for the RGB value(being white).
Key Concepts
In this tutorial we ignore the IDAT chunk. That is for the next tutorial.
But, here is a bit of information that might be richly endorsed by your amazing minds!
PNG Filtering
PNG Filtering will transform a PNG image with the goal of improving the compression of the image.
Something to keep note of is the filtering is applied to bytes, not the pixels.
Filtering will use the following values to generate the new byte value:
x -> byte being filters a -> byte corresponding to x before the pixel containing x b -> byte corresponding to x in the previous scanline c -> byte corresponding to b before the pixel containing b
The types of filtering are:
Type | Name | Filter Functionality 0 | None | Filt(x) = Orig(x) 1 | Sub | Filt(x) = Orig(x) - Orig(a) 2 | Up | Filt(x) = Orig(x) - Orig(b) 3 | Avg. | Filt(x) = Orig(x) - floor((Orig(a) + Orig(b)) / 2 4 | Paeth | Filt(x) = Filt(x) = Orig(x) - PaethPredictor(Orig(a), Orig(b), Orig(c))
Lets take a deeper look at Paeth.
Paeth
The functionality of Paeth is to compute a simple linear function of the 3 neighbouring pixels(left, above, upper left).
Lets take a look at what PaethPredictor is doing:
|_________________| | p = a + b + c | |_________________________________| | pa = abs(p - a) | | | | pb = abs(p - b) | -------> | Pr = a IF pa <= pb AND pa <= pc | | pc = abs(p -c) | | | |_________________| |_______________________________ | | | | |____________________| | | |<----| | Pr = b IF pb <= pc | | | |____________________| | | |----> if no to both, Pr = c
I think that ought to be self explanatory.
Compression
Compression method zero is defined the the IS(International Standard). It stands for deflate/inflate of the max size of 32768 bytes.
Deflate-compressed data streams are stored as the "zlib" format:
- zlib compression method -> 1 byte
- Additional Flags/check bits -> 1 byte
- Compressed data block -> n bytes
- Check Value -> 4 bytes
For compression method zero, the compression method should be 8(deflate compression).
If the data being compressed is 16384 bytes or fewer, the encoder might round up to the power of 2. This will:
"decreases the memory required for both encoding and decoding, without adversely affecting the compression ratio"
Compression of filtered streams
The sequence of filtered streams will be compressed and split into multiple IDAT chunks. The IDAT chunks can fall anywhere within the zlib datastream.
Summary
Note: If you want me to explain the 14 other possible chunk names, let me know down below. I will edit this post and add them!
Welp. That's a wrap for the file format of a .png image. We went a bit deep with explaining how it works. But I think the explanations are overall pretty good.
Study this information, and prepare yourself for the next tutorial!
Until then, MocaCDeveloper logging out!
Can I get a link to the next tutorial?
Very cool! I’ve always wanted to see a tutorial like this!
@RoBlockHead
I slowly came to realize that low-level development isn't just about being able to write assembly. It's about being capable of working with pixels lol
@MocaCDeveloper Working with pixels is something that you never get to see often, but it's so interesting! I guess I have to learn how to work with sound next...
@DynamicSquid
Oh i never thought of that. Working with sound in C. That would be interesting!