World Photography Forum - View Single Post

robski · #33 16-08-07, 00:21

The final stage of the JPEG process is to use the lossless Huffman compression coding to compress all of the run length compressed (another lossless compression technique) DCT terms. As mentioned in a previous post Huffman is a variable length code based on probabilities. Unlike most other codes which are fixed length block codes. For Huffman to work well there must be values in the data that occur more frequently than others. If the values are completely random then Huffman coding is pointless.

Now to run you through a typical training example that illustrates the benefits of Huffman compression.

Imagine that digital codes are used to transmit various regional weather conditions to shipping.
Let's consider there are 4 states for visibility as outlined in the table below

HTML Code:

Visibility	Code word using a 2 bit block code
-----------------------------------
Very Poor         00
poor              01
moderate          10
good              11

There are 4 state which would require a 2 bit code

To workout the Huffman code you need to know the probabilities of each state. In our case the probabilities are as follows.

HTML Code:

Visibility	Probabilities
-----------------------------------
Very Poor	 0.1
poor             0.3
moderate         0.5
good		 0.1

The most probably events gets the shortest code and the least probably events get the longest code

HTML Code:

Visibility	Code word
------------------------------------
Very Poor	 111
poor		 10
moderate	 0
good		 110

Using a block code it will take 2 bits to transmit the information regardless of the visibility condition.

In the case of Huffman the average code length will be

(3bits x 0.1) + ( 2bits x 0.3) + (1bit x 0.5) + (3bits x 0.1) = 1.7 binary digits per code.

An average saving of 15%

Care is needed in choosing the Huffman codes so that the string of codes are uniquely decodable. We need to know when a code starts and finishes as they are no longer of a fixed length.

for example

110011110 = good,moderate,very poor, poor

In some situations the probabilities of events are know in advance but in the case of image data they have to calculated for each image before coding can commence. In most cases a coding tree or dictionary has to be constructed and included in the file so that compressed Huffman data can be decoded when the image file is opened on the computer. An example coding tree is attached. Because of the overhead of including the coding tree - dictionary, large size images benefit more than smaller ones.

By now I should imagine that you now know more about JPEG Baseline compression than you every wanted to know. Therefore in future posts I will focus on the practical aspects.