The final stage of the JPEG process is to use the lossless Huffman compression coding to compress all of the run length compressed (another lossless compression technique) DCT terms. As mentioned in a previous post Huffman is a variable length code based on probabilities. Unlike most other codes which are fixed length block codes. For Huffman to work well there must be values in the data that occur more frequently than others. If the values are completely random then Huffman coding is pointless.
Now to run you through a typical training example that illustrates the benefits of Huffman compression.
Imagine that digital codes are used to transmit various regional weather conditions to shipping.
Let's consider there are 4 states for visibility as outlined in the table below
HTML Code:
Visibility Code word using a 2 bit block code
-----------------------------------
Very Poor 00
poor 01
moderate 10
good 11
There are 4 state which would require a 2 bit code
To workout the Huffman code you need to know the probabilities of each state. In our case the probabilities are as follows.
HTML Code:
Visibility Probabilities
-----------------------------------
Very Poor 0.1
poor 0.3
moderate 0.5
good 0.1
The most probably events gets the shortest code and the least probably events get the longest code
HTML Code:
Visibility Code word
------------------------------------
Very Poor 111
poor 10
moderate 0
good 110
Using a block code it will take 2 bits to transmit the information regardless of the visibility condition.
In the case of Huffman the average code length will be
(3bits x 0.1) + ( 2bits x 0.3) + (1bit x 0.5) + (3bits x 0.1) = 1.7 binary digits per code.
An average saving of 15%
Care is needed in choosing the Huffman codes so that the string of codes are uniquely decodable. We need to know when a code starts and finishes as they are no longer of a fixed length.
for example
110011110 = good,moderate,very poor, poor
In some situations the probabilities of events are know in advance but in the case of image data they have to calculated for each image before coding can commence. In most cases a coding tree or dictionary has to be constructed and included in the file so that compressed Huffman data can be decoded when the image file is opened on the computer. An example coding tree is attached. Because of the overhead of including the coding tree - dictionary, large size images benefit more than smaller ones.
By now I should imagine that you now know more about JPEG Baseline compression than you every wanted to know. Therefore in future posts I will focus on the practical aspects.