View Full Version : Data Compression
I'd been promising to write a piece on the topic of data compression for a while. Hopefully I can shine some light on the subject to give you a better understanding of what is going on. It is quite simple once you have got your head around 3 basic concepts.
1) Think of a computer file as a container, cardboard box or an envelope. It can hold something of a certain size.
2) Think of a computer image as an egg tray or a piece of graph paper. It has a set number of positions depending on how many pixels are in the image. Therefore the image in the video display memory of the computer is of a fixed size.
3) The amount of Information an object contains. This concept more abstract and not so easy to grasp.
Lets take an object like a sheet of A4 paper with printed text.
In our first example it just has the words "Hello World"
The second example is say an Electricity Bill. It will contain the following sort of information. Addresses, telephone numbers, prices, units used, dates, taxes and so on.
So our second example contains much more information then the first one. Therefore the first example only needs a small sheet of paper compared to the larger sheet required for the bill.
It then follows that a smaller envelope or container can be used to hold the first example.
The Key point being there was a lot of wasted ( redundant - unused ) space on the "Hello World" example.
I will leave you with these concepts to let them soak in before looking at what methods are used.
Data Compression is about finding a more efficient code to hold the information.
If the information is already stored efficiently with no waste then it will not be easy to compress it further.
Are you all with me so far ?
With you all the way, I used to write random access databases,
and the need for data compression was intense in those early
days, with limited storage space. I wrote a number of algorithms
depending on the type of data to be stored. The need for set
standard storage packets was essential in order to store a full
random access database, which had to be read, unpacked,
viewed, altered (perhaps) re-coded and re-written and
re-stored in the same space. All done from machine code.
Fun days ...
Best Wishes, Raymond
Let us first consider how much memory an uncompressed bitmap raster image uses.
Using a small 1280 x 1024 pixel desktop image as an example.
Each pixel requires 3 memory locations to store the RGB colours.
So 3 Blocks ( arrays-grids) of 1280 x 1024 memory are used = 3932160 memory locations
Therefore any 1280 x 1024 image will always use 3932160 memory locations
The Windows Paint bmp files stores data in this uncompressed format plus some header info.
With larger images the huge file sizes create a problem for data storage and transmission over networks.
To overcome these problems data compression is used to reduce the file size.
The first data compression methods devised were loss-less. That is after compression and decompression you get back the original data.
I will give you a basic idea of some of the loss-less methods used but they relied on the data being inefficiently coded in the first place to get good compression ratios. With a few exceptions as a general rule of thumb if the compression ratio is better than 10:1 then it is not a loss-less method used.
To improve compression ratios further lossy methods were developed. That is after compression and decompression you get back something that looks like the original data.
An excellent thread Rob that I am sure will be of great value to many.
Let's look at some of the simpler lossless data compression ideas for background information. Unless your going to design a program there is no point in understanding it to the nth degree.
They basically take advantage of waste, unused values or inefficiently encoded raw data.
A simple example is some English plain text with each character stored in an 8bit byte. Data stored in a byte can hold 1 of 256 values. A typical text message is make up of 52 upper and lower case characters, 10 numbers, a dozen punctuation marks and a few control codes. Over half of the possible positions (values) are never used. In this case a 7bit code (128 values) would be more efficient. The 7bit codes would be packed together to fit into the 8bit bytes reducing the file size by ~12.5%. Not a very useful method for graphic images but illustrates a point.
Another idea is Run Length Encoding. This takes advantage of long streams of identical data.
For example AAAAAAAHHCCCCCYYYYYYRRRRRRRRRR
could be re-encoded as 7A2H5C6Y10R The number indicates the run length followed by the value of the run.
or in the case of binary data in a 1bit monochrome image ( Fax machine )
could be re-encoded as 634811 to indicate how many zeros and ones follow each other.
If you think about it. Long streams of the same information it is telling us something we already know and therefore redundant. This information can be discarded.
There is a big disadvantage with the run length method when the data changes very rapidly
For example AHCYR
would be re-encoded as 1A1H1C1Y1R which is in fact larger than the original.
Rapidly changing data contains more information which needs to be retained by using more code.
Two Common 1bit fax compressions are group 3 and group 4. The group 3 works in one direction only along the row of pixels. Group 4 works in both directions and takes into account multiple rows. Although group 4 gives higher compression ratios it has more work to do which results in it being slower than group 3 when compressing or decompressing.
Another idea is to look for common Patterns in the raw data and replace them with shorter special codes.
An example in text data is to replace common words or parts of words ( the, and, or, in, to, un, ed, for) with a shorter unique code.
The last idea involves more processing by counting the number of times a value occurs in the data. The most frequent values are assigned short codes which get longer and longer as the frequency of the value reduces.
A perfect example of this is Morse Code - remember that thing with dots and dashes on the radio. The letter E (the most common character in English) is a single dot and less common characters like J & Q may have 4 or more dots and dashes. The overall effect is to shorten the coding on a typical message.
In cases such as English text the typical frequency of characters is well know and a standard coding table or dictionary can be used instead of calculating a new table from scratch each time. In essence we are calculating or predicting the probability of values in the raw data. Huffman coding is based on this idea.
What is important is to match compression method to the type of information in the data. Choosing the wrong method will either make the file bigger or give poor compression. Advanced compression programs first analysis the data to decide which is the best method to use.
The LZW Compression is very effective and uses the more complex ideas but unfortunately it's use required a license. So not to be found in your cheap and cheerful photo editing software.
Image pixel data is stored in one of two formats. Contone (CONtinuous TONE) where data is stored in bytes to represent the different levels of brightness and colour. This is the type of data produced by a scanner or camera. For the purposes of printing newspapers and magazines on a printing press these are converted and stored in 1bit image format. Basically a stream of zeros and ones ( on - off signals if you like). Often referred to as 1bit tiff or bitmaps, Line Art and halftones images. The Lossless Group 3 and 4 compress 1bit data very well. It is the Contone data that presents the headache for compression.
Graphic image data with lots of fine detail when compressed using a lossless method entail lots of processing for little compression effect. New ideas were needed to overcome the problem which led to a detailed examination of the information stored in an image. An image is shades of light and dark of different hues. The viewer is the human eye and brain. The new ideas were centered around exploiting the strengths and weakness of the human system. What information does the eye and brain need to see the image correctly? What detail can we afford to lose without it being noticed and what detail lost will be noticed ?
The brain is a powerful tool and can workout a lot of things for itself.
"Digital imagi_g is rapidly replacing film phot_graphy in consumer _nd professional markets"
If you can read and understand English the message in the sentence above is not lost despite the fact 3 characters are missing.
One approach of removing information from graphic image data is to apply a process that will remove noise (seen as a fine grain effect) caused by the input sensor of a camera or scanner. By definition Noise is unwanted and random. The denoising process is designed to smooth out random pixel hot spots and leave behind the wanted picture information. A simple approach is to zero the least significant bit of a byte. Or in Plain English to round down all odd values held in a byte to the next even number. Sophisticated programs such as ninja, picture cooler and neat image analysis a patch of bland picture area to build a profile of the noise characteristics and apply it to the whole picture area. The overall effect is much the same in each case that it has smoothed out many of the little pits and bumps in the range of data values. Effectively reducing the number of changes (transitions) in value of the data. Changing the pixels colour value by 1/256th is not going to be noticed.
Example original raw data
Compressed original ( in this case the code is longer )
Smoothed data ( noise removed )
Compressed Smoothed data
The smoothing has increased the length of runs of data which will improve the compression ratio.
Of course this approach of throwing information away can only be applied to audio/visual data. Dangerous and senseless to throw away information in a spreadsheet of an accounts system.
Next we will turn our attention to JPEG. As yet nobody has managed to slap a license on its use so camera manufactures and photo editing programs can freely enjoy the benefits of this picture image format. In 1982 the ISO formed the Photographic Expert Group to research methods of transmitting Video and still images over data links. It's Goal was to produce a set of industry standards. In 1986 the CCITT set up a group to work on Compression methods required for colour facsimile machine transmission. In 1987 the two groups were combined to form a joint committee (Joint Photographic Experts Group ( JPEG )) that would research and produce a single standard.
JPEG unlike other compression methods is not a single algorithm but maybe thought of as a toolkit of image compression methods to suit the users needs.
Fascinating stuff Rob ... Thanks
JPEG uses a lossy compression method that throws useless data away during encoding. This is why lossy schemes manage to obtain superior compression ratios over most lossless schemes. JPEG is designed to discard information the human eye cannot easily see. The eye barely notices slight changes in colour but will pick out slight changes in brightness or contrast. Therefore JPEG's lossy encoding tends to be frugal with the gray-scale part of an picture and frivolous with the colour component. A typical photographic quality image maybe compressed by 20:1 without experiencing any noticeable degradation in quality.
If we look at the original RGB image we will find 3 high definition images one for each colour.
The first task of the JPEG encoder is to convert the RGB image components into a monochrome component ( luminance), a blue chroma component and a red chroma component. For the more technically minded it have converted from a RGB colour space to a YCbCr colour space used in video systems. Those who use Photoshop may of bumped into changing the mode of a RGB image to Lab.
Lab is a similar idea to YCbCr
Where L is the brightness component, a is the red/green component, b is the yellow/blue component
The purpose of the separating RGB into mono and colour it to allow different processing to the luminance and chroma channels.
Below we can see the 3 high definition RGB channels.
I have also added an illustration that shows the luminance (gray scale (L channel)) and combined chroma (a + b channels ) components separated from a RGB image using Photoshop. From this we can see that the gray scale is high definition and the chroma is very low definition. In fact if you blurred the a and b channels and converted back to RGB you would barely notice any difference. A completely different story if you sharpened or blurred the L channel.
The next stage is to apply Chroma sub sampling (down sampling - reduce the resolution).
The simplest way of exploiting the eye's lesser sensitivity to colour information is to use less pixels for the two chrominance channels
The luminance channel is retained at full resolution.
Both chrominance channels are typically down sampled by 2:1 horizontally and either 1:1 or 2:1 vertically
For example a colour image of 1000 x 1000 pixels, the luminance (grey/mono) channel will remain at 1000 x 1000 pixels but the chroma channels would be either 500 x 1000 pixels or 500 x 500 pixels. So a single Chrominance pixel would cover either cover 2 x 1 or 2 x 2 luminance pixels dependent on a jpeg encoder setting. This maybe under user control as part of the quality setting or just hard coded by the programmer.
So in terms of compression the higher down sampling will give us 50% compression with very little perceived lost of quality with photographic images .
i.e. we store 6 pixels values for each 2x2 block ( 4 luminance values and one for each of the 2 chrominance channels) instead of the 12 needed to store at full resolution in each channel.
The existence of chroma sub sampling in JPEG compression explains why better compression ratios are achieved (ratio between original file size & compressed file size) with colour photos than with greyscale (mono) photos.
Most DSLR cameras use 2x1 chroma down sampling while some graphic editing programs and point and shoot digicams use 2x2.
The existence of chroma sub sampling in JPEG compression explains why better compression ratios are achieved (ratio between original file size & compressed file size) with colour photos than with greyscale (mono) photos.
Ah ha - so that's the answer. I've always wondered why a when preparing my images for the forum, to achieve ~200k file size from a 1000 x 667 image, with colour a jpeg compression quality setting of 10 gives the required file size, where as with mono I have to reduce the quality setting to 6 or 7
Yes Clive a number of things will click into place once you get an understanding of what is going on inside the black box. Of course in some applications such a medical images chroma down sampling is undesirable because the wanted information is in the colour so images should not be stored in the jpeg format. Just out of interest jpeg does have a lossless method but the compression ratios are nothing to write home about compared to the Baseline (Standard) method I am describing.
I am glad a number of you are interested and managing to follow what can be a complex subject. All we need to do is grasp the concepts applied to get an idea why some images compress better than others.
Part of my degree profile was to spend a year on telecoms. The data encoding and decoding aspect I found fascinating.
The boffins who worked on complex ciphers used in secret messages we have to thank for some of the concepts developed. Other concepts are very mathematical and abstract.
Keep them coming Rob, this is a fascinating read.
Most interesting thread I have seen for quite a while, thanks Rob.
Thanks Mick & Dave
I am trying to provide the info in bite size chunks as and when I get a few free moments.
A point to mention about the Chroma down sampling is that the colour information is lost forever. Plus further opening and resaving of the jpeg file will cause further lose of colour information. How important colour information is in our lives is open to debate. When you consider that 1 in 20 men are genetically disadvantaged and suffer from some form of colour blindness. The best way of explaining colour blindness is the restriction (compression) of the range of colours in a rainbow. Women ( only 1 in 200 suffer from colour blindness ) will see all the hues of reds and greens. Whereas men may just see a red and a green and in severe cases not be able to tell the difference at all.
Moving on to the next process step.
The luminance and chrominance components of the image are divided up into an array of 8x8 pixel blocks. Padding is provided if required to ensure blocks on the right and bottom of the image are full. This segmentation into blocks is in preparation for the next stage which is the most processor intensive and difficult to comprehend. But don't let that put you off :rolleyes:
The fans of Monty Python will probably recall the line "What did the Romans do for us" and after some thought a long list of innovations were revealed. Well we can twist this and say "What did the French do for us" and I am sure a equally long list could be produced. As far as we are concerned they produced two interesting branches of mathematics. We are going beyond the simple idea of adding up a shopping list. We are into the realms of Roger Penrose helping Stephen Hawking produce a Mathematical proof that Black Holes exist in space.
Sounds very complex but the concepts stem from a very simple idea. These ideas get extended and become very abstract and can be applied to all sorts of problems. Take the French gamblers who were trying to work out the odds or devise a system to win at the Casino. They were responsible for statistics. In fact if you touched on the subject at school the typical example was what is the chance of throwing a dice and getting a 6. Then what is the chance of throwing two 6s in a row. This leads us to workout the probabilities of events and as we saw earlier it is used in some compression schemes.
French engineers came up with a concept of using a mathematical model to solve problems. Any object or structure can be made up from smaller components.
For example a house is made up from bricks. If they need to calculate the strength of a house, bridge or dam they would test the strength of a single component. That is test a brick to destruction. Then workout the relationship of a few bricks joined together with mortar. This information can then be extended to larger buildings. In some real world situations the mathematic is too complex or long winded to process. In these cases a simpler approximation is used to replace some of the components. A bit like using a clear piece of plastic instead of glass because the required properties are similar. Another extension of this is to transform something in to something else because it is easier to work with. An example could be to turn solid ice into liquid water or a gas steam.
I am hoping the above has extended your mind into a way of lateral or abstract thinking as this is required to follow the next stages of the jpeg processing.
Let's consider Joseph Fourier's concept of where an object can be broken down into a number of components and reconstructed from them. To be correct his concept was an infinite series of components but in practice a practical limit has to be imposed. If we take a note from a musical instrument and feed the sound into a frequency/spectrum analyser it will show the fundamental frequency of the note and number of harmonics. Each frequency being at different amplitudes. Attempts to create electronic music used these ideas. Remember the Moog Synthesizer and electronic organs of the 1970's.
If you did your own research into jpeg compression at this stage of the processing you would be told the 8 x 8 block is fed into a forward Discrete Cosine Transformation (DCT) process so that the image is transformed from a spatial domain representation to a frequency domain representation.
Most of you would just said what the Hell !!! This is perhaps the most confusing of all steps in the process and hardest to explain.
For a spatial domain think of an object in space with 3 dimensional coordinates
For frequency/time domain think of a time line with the slowest objects at one end and the fastest the other end. In fact what you would see on the display of a frequency/spectrum analyser.
So in effect we have taken a 8 x 8 picture element and fed it into a frequency/spectrum analyser to measure the strength of certain frequency components.
The output of the forward Discrete Cosine Transformation (DCT) process is a set of 64 values. Each value is the strength of each frequency component which start from a steady level (DC - average of all pixels) and each step is an increase in frequency in the x and y direction in the 8 x 8 block.
Still not easy to visualise.
So I managed to find a diagram showing the 64 frequencies. In my mind I find it easier think of these frequencies as a set of patterns and the DCT process is finding the strength of each pattern in the 8 x 8 block.
Therefore when it comes to decode the jpeg and reconstruct the 8 x 8 block it is easier to think of it as we need strength 4 for pattern 1, strength 8 for pattern 2, strength 0 for pattern 3, strength 12 for pattern 4 and so on. Almost paint by numbers ;)
Just out of interest the DCT process is related to the fast Fourier transform.
The next step is Quantization which is the procedure of constraining the 64 outputs of the DCT process to a discrete set of values, such as an integer (whole number), rather than a continuous set of values, such as a real number (number that can have a fractional part). This is done by dividing all 64 strengths of frequency (pattern) by a corresponding value in a quantization table and then rounding the result to the nearest integer. The values in the quantization table are chosen to preserve low-frequency information as humans are more critical to loss of information in this area and discard high-frequency (noise-like) detail.
An example quantization table - the top left hand corner represent the low frequency and bottom right hand represent the high frequency.
A value in the table is used to divide the corresponding output of the DCT in the 8 x 8 block.
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
This is the point at which the user (you) can control the quality and amount of compression of the JPEG. Most application such as Photoshop have a slider or drop list of quality settings. The quality setting ( Quality factor ) is used to scale the values in the quantization table.
Outputs that either had a small frequency (pattern) strength or a large divisor in the quantization table will likely round to zero.
The lower the quality setting, the greater the divisor, increasing the chance of a zero result. On the converse, the highest quality setting would have quantization table values of all 1's, meaning the all of the original DCT data is preserved.
An important point to realize here is that the quantization table used for this step differs between nearly all digital cameras and software packages. Camera manufacturers independently choose an arbitrary "image quality" name (or level) to assign to the 64-value quantization matrix that they devise, and so the names cannot be compared between makes or even models by the same manufacturer.
The Quantization table used is stored as part of the Jpeg.
The Quantization process is the main source of the Lossy Compression.
We are on the home straight now the rest is simple compare to the DCT and Quantization processes.
Fascinating stuff Rob, going to take some reading and rereading to get to grips with it though.
Thanks Mick - Is there any part which is proving troublesome which maybe I can expand upon or provide diagrams ?
Below is an example set of Quantization tables used in a Nikon Camera. The upper 8 x 8 matrix is used for the luminance channel and the lower two are used for the two chrominance channels. From this we can see that the tables used for the chrominance channels discard more of the high frequencies than the luminance table.
The whole point of the DCT is to separate the image data into bands of frequencies. Then through the Quality Control setting (compression level slider) the high frequencies components can be progressively discarded as the compression level (ratio) is increased.
The 64 quantized frequency bands are then sorted so that the order runs from the lowest frequency component to the highest. Which will result in an increasing stream of zeros at the end as the compression is increased. It then becomes idea to apply the lossless Run length compression method on the long stream of zeros in the next stage.
many thanks for this,spent the last 20 minutes,reading through,and no doubt will return to some points,you have a knack for giving an explanation that flows along.
Lee :D :eek:
Good stuff Rob, got to be honest though I get a bit lost at Fouriers concept, I used to be a radio ham so I understand the frequency spectrum bit but am having trouble with what the 8X8 grids actually mean with relation to the image being processed, I am determined to get it sussed though.
Let us take a step back for a moment and reconsider the 64 outputs for the DCT process from the 8 x 8 pixel block of the image data. The first output is a steady (DC) level of all the 64 image pixels averaged together and the other 63 outputs represent different frequency (AC) levels found in the 8 x 8 pixel image block.
If through the user compression level (quality factor) slider in the quantization stage it discarded all of the 63 AC outputs the resultant image would show 8 x 8 pixel areas of the same tone. The image will get maximum compression typically something in excess of 120:1 but you may of dumped a lot of image information to get it. This is only important if the image information was there in the first place. In practice an applications maximum compression setting is not that brutal but does introduce a big reduction in the strength of the high frequency components.
Attached is a Screen grab of a before and after scaled by 300%. The upper part is the original tiff image and the lower is the file saved as jpeg in Photoshop with a Quality setting of zero ( maximum compression). Compare the 2 halves.
In the some areas of the lower half you can clearly see the 8 x 8 block structure where the weak high frequency detailed information has been completely averaged (smoothed). You should also notice a slight colour shift.
Just to complete the jpeg compression story we will briefly look at the remaining processes. In a previous post I had already mentioned that the 63 AC components from the DCT process were run length encoded ( lossless compression method ). However the DC component is treated differently. It is assumed that neighboring 8 x 8 blocks will have a similar average value so instead of dealing with a large number format it uses a small number format which defines the difference in level from the previous block thereby requiring less code to store the information. The last stage is to use Huffman encoding (another lossless compression method) to compress again all the above compressed data.
There is a lot of work involved to produce the humble JPEG image file.
Another thread touched on Digital radio which makes use of MPEG. Many of the concepts covered above apply to the human audio system. They know which bits they have to preserve, emphasis and deemphasize. A few people tried to compare it with CD music quality and vinyl. CD technology is not in the same ball park. A CD ROM holds about 2Gbytes of data. Two thirds of it is required to make the technology work reliable. A special 8 to 14 bit encoding to limit the bandwidth of the laser pickup circuits. Checksums so that the electronics can auto-correct corrupted data. Interleaving of data so that small scratches on the disc don't destroy huge chunks of data - but instead give smaller chunks of lost data which then gives the auto-correction half a chance of working. The data is blocked up into frames which requires control codes to keep the frames in sequence and say when user data starts and finishes. If I recall over 100 frames has to be read in before decoding can start. So only one third of the disk contains music or computer files.
Oops got carried away there - My mind slipped back to a previous life time in electronics.
OK I think we have done the JPEG theory to death now. For my sins I had to research the subject for problems at work and ended up knowing more than I ever wanted to know. :D
In future posts I want to focus on the practical aspects of using JPEG
Mick does the photo of the 64 frequency elements in post 16 help ?
The 8 x 8 is just a manageable block size it could of been 16 x 16 but then it would contained a higher number of frequency components and would make the DCT design very complex. For the same reason the JPEG spec imposes a 14 bit limit on the input channels ( If I recall correctly I don't think it is 12 bit). Hence you can't save as a 16 bit JPEG.
You can get too hung up on the DCT processing. Most of us with an electronics background can understand filters (bandpass, highpass & lowpass) made from coils, resistors and capacitors. Digital filters are something else. Extremely mathematical in their design. In practice their circuits involve shift registers, addition and subtraction units. For example if you shift a set of data 3 places to the right and the add it to the same set of data shifted one place to the left you have performed a mathematical function upon the data. The same principle applies when you sharpen or blur an image in Photoshop. You apply a filter with a mathematical function.
Going off on a similar tangent think about noise reduction. These often use a process called auto-correlation which entails phase shifting the data which smooth out the high frequency pits and bumps. A thing which helped a few things click into place for me in this area was to think about some calculation used in accounting. The one used to average the last 3 years expenditure as time rolls forward to see the general treads rather than focus on a single bad month. If you think about it this is exactly what a lowpass filter is doing. We can change the design by averaging 5 or 7 years instead to get a smoother output.
Rob they have all helped, I think I have the gist of it but am lacking a lot of the detail but that will come when I have read it through a few more times and had the time to think about it.
Noise the Compression Killer.
In an earlier post we talked about the effect noise has on compression ratios. For this post I have conducted some simple experiments in Photoshop to provide some test results for you to compare and see what effect the noise has had on the resultant file size.
For all the tests I have used an image size of 800 x 640 RGB pixels which works out to be 1.46 M bytes of uncompressed data. Each image was saved using Photoshop CS2 JPEG compression level 12 ( max image quality - minimum compression.)
The first test was to create a 50% grey image using the colour picker and fill commands. The image saved down to 53,114 bytes.
A compression ratio of 29:1. Would of been better if the image area was bigger but we are suffering the overheads of the JPEG format.
This image contains virtually no information other than there are 512000 RGB pixels of a value of 127 in a 800 x 640 matrix. A more intelligent compressor would describe all that is needed to recreate the image in a dozen bytes or so.
I repeated the exercise but this time created a bright red image which saved down to 53,386 bytes. This was simply to compare a RGB image with no colour to one with a bright colour.
I next created another 50% grey and added 0.5% Gaussian noise which saved down to 185,054 bytes
The noise level was increase to 1% for the next test which saved down to 524,146 bytes
The grain effect (noise) was barely noticeable in the two images above and typical of a low noise image straight from the camera.
The last noise introduction test was with a noise level of 5% which saved down to 933,222 bytes In this case the grain was really noticeable and typical of low light photography using a high ISO setting.
The last test was to see how effective a noise reduction program such as Neat Image was at reducing the file size. A crop of 800 x 640 pixels was taken from a straight out of camera file shot at ISO 400. This saved down to 374,052 bytes. The same crop was passed to Neat Image for some very basic processing. This time it saved down to 293,126 bytes. A reduction of 20% in file size. My general experience has been that you get typically a 20% to 30% reduction in file size using a noise reduction program.
From the results above you can see it is very important to keep noise under control when trying to keep files sizes small.
Tips to avoid unwanted noise (grain effect)
- Don't shoot at a higher ISO(film sensitivity) setting than necessary.
- When choosing a new digital camera avoid cameras that have a very small sensor.
- Take care not to vastly underexpose shots. The camera sensor like all electronic devices will produce a certain amount of electrical noise. It is therefore important that the signal (light in this case) is much stronger than the noise.
- Take care in Post processing not to unduly increase noise levels when increasing the contrast or sharpening.
Noise levels can be reduced by using a noise reduction program or plug-in. However, care is needed as over processing can produce unrealistic results by removing image detail along with the noise.
In the last Post we saw how unwanted detail (noise/grain effect) reduced the compression ratio. Unfortunately wanted fine detail has the same effect. :(
In 1948 Claude Shannon introduced a concept called Entropy which gives a fundamental mathematical limit on the best possible lossless data compression of any communication. This is all part of information theory, working out the shortest code to send a message. Measuring the information content in a piece of data. The key is to keep the essential information and discard the redundant data (code). Using a lossless compression scheme on fine detailed image data your be lucky if you manage to halve the file size (2:1 compression ratio). So according to Shannon's information theory some information has to be discarded if you want to reduce the file size further. ( or get a symbol to carry more information but thats another story) It was the JPEG committees job to find methods to discard information that the human eye and brain would not notice and if further information was discarded the brain could still recognize the image. The lossy compression scheme will continue to thrive all the time we have limited data storage and limits imposed on network bandwidth. The day we all have fibre optic cable to the doorstep, terabyte memory cards and petabyte hard disks the nasty JPEG baseline compressed images will be as useless as wax cylinder records. Remember JPEG was developed back in the days when the 56k modem was king.
Apart from the general interest in the topic the main aim is to help people limbo under various file size limits imposed by forums .
Some more experiments in Photoshop to illustrate that increased image detail (information) reduces the compression ratio.
I have used the same image size of 800 x 640 pixels which is typical of those posted in the Gallery. For each test I created a coloured checker board pattern with an increased number of squares. The Jpeg Quality setting was 12 ( max quality - minimum compression ). Note non of these images will have camera noise.
53,290 bytes 1 square
53,965 bytes 4 squares
62,548 bytes 16 squares
109,098 bytes 64 squares
209,375 bytes 256 squares
334,432 bytes 1024 squares
654,468 bytes 4096 squares but with a Quality Factor setting of 6 it just managed to get the file size (275,236 bytes) below the forums 300K limit
Somewhere between 256 and 1024 squares we had blown this forums 300K limit. To meet this limit extra compression is required with some loss in image quality.
In the case of 4096 squares ( each square ~ 12 x 10 pixels ) which is not especially detailed required a heavy compression setting of 6.
Assuming you have already taken some action to reduce the noise levels, what is the best course of action for images that have large areas which are highly detailed. For example shots showing the rough tree bark texture or fine grains of sand. Even shots showing lots of grass blades present a problem. Well to be honest you are going to struggle to get these under the 300K barrier. In a few cases you will fail to compress the image with sufficient quality to warrant posting.
Using a quality setting less than 5 will degrade the image so much that it will no longer show the fine detail you were trying to express. The next option is to reduce the image pixel count either by cropping or re-sampling. If you crop tighter try to exclude some of the detailed areas that are less important to the composition. If you downsize by re-sampling use the bilinear method which tries to preserve image detail. You may end up with a postage stamp sized image and still not worth posting. As far as baseline JPEG is concerned it's a no win situation. It might be worth exploring fractal based compression.
In general for the image to hit the optimum 200K with good quality it must have sizable areas with little or no detail. For example large expanses of blue sky or out of focus or plain backgrounds. Even if your border line with these types of shot you can often gain a bit of extra compression by lightly running the blur tool over these low detailed areas of the image.
Some readers are gluttons for punishment and keen for more detail on the Baseline JPEG compression method. I'll see if I can come up with a few more illustrations. In the meantime I am experimenting with a computer program written in C to further illustrate the Discrete Cosine Transform process.
For those who are interested more information on the RGB colour space to YCbCr colour space conversion during encoding.
The conversion separates the RGB image to monochrome ( luminance Y) and colour (chrominance Cr Cb) channels so that different levels of compression can be applied to the monochrome and colour components.
RGB to YCbCr Conversion is done by taking different proportions of the RGB channels as shown below
YCbCr (256 levels) can be computed directly from 8-bit RGB as follows:
Y = 0.299 R + 0.587 G + 0.114 B
Cb = - 0.1687 R - 0.3313 G + 0.5 B + 128
Cr = 0.5 R - 0.4187 G - 0.0813 B + 128
Down sampling is often applied the Cr and Cb chroma channels after conversion.
The attached image shows the original image and it's conversion to luminance and chrominance channels.
When the JPEG file is decoded the YCbCr colour space needs to be converted back to RGB as outlined below.
Before this happens the chrominance channels will have to be up sampled to match the original image size.
RGB can be computed directly from YCbCr (256 levels) as follows:
R = Y + 1.402 (Cr-128)
G = Y - 0.34414 (Cb-128) - 0.71414 (Cr-128)
B = Y + 1.772 (Cb-128)
Further illustration on the theory behind the Discrete Cosine Transform (DCT) process. Basically the DCT process is measuring the strength of predefined Sine wave patterns in the 8 x 8 pixel block. It is looking for a number of patterns at different frequencies.
Through Fourier's work the concept was developed that said any shape can be broken down into a series of sine wave components of different frequencies and amplitude. The mathematics is very complex but it is easier to understand a visual demonstration of the Fourier principle. Therefore I have managed to find a couple of links to java applets which demonstrate the construction of complex shapes from a series of sine waves of different frequency and strength.
You may need to download a java plug-in to get them to work if your web browser is not already java enabled.
The first one is very basic. It has 8 sliders which control the amplitude of each frequency. You can see the effect of adjusting the amplitude of each component.
Two buttons are also provided which are programmed to preset the sliders to provide a Sawtooth and Square waveform.
The second link provides some dry theory on the left. On the right is the applet. Press the Clear button to start. Then select one of the eight wave shapes to be demonstrated. The first time you click Add the first harmonic frequency is displayed. Each time you click the add button the next harmonic component is added (shown in the lower window). The more add clicks you make the composite slowly builds up to the required waveform.
Let's set off by saying the Discrete Cosine Transform is a tedious sequence of additions, subtractions and multiplication on all the rows and columns. It is not my intention to drill down that deep. Besides it would take an age to sit there with a calculator pressing all the buttons to step through the sequence of processing an 8 x 8 array of image data. Best left to your PC CPU or a dedicated chip found in your camera.
The following information I pulled from a DCT microchip data sheet. It is my intention just to give you an overview of the complex processing involved. Part of the process is to use values of different frequency Cosine waves ( a cosine is the same as sine it just starts from a different point). It is senseless to keep calculating these values every time you want to use them so they are stored in a table. The table is a 8 x 8 matrix. The table stores a steady value term (DC) and 7 frequencies (AC) terms. The first frequency ( the fundamental ) is chosen so that the first half cycle covers the 8 image pixels. The other frequencies are harmonics of the fundamental frequency. I have attached a table used by this microchip and plotted the values in the table to show the DC term (0th) and the 7 other frequencies (1th - 7th terms).
In the JPEG processing of the 8 x 8 image data uses a Two Dimensional DCT process. In practice this is executed by using a One Dimensional DCT on the columns ( left to right ) in the 8 x 8 matrix followed by another One Dimensional DCT on the rows ( top to bottom ).
The other attachment is a block diagram showing the processing of a One Dimensional DCT. The inputs are XK0 though to XK7 taken from the 8 image pixels in a row. The unit will add together the image pixels for even numbered rows or subtract one from the other on odd numbered rows. These results are then multiplied by one of the cosine values from the table. The value used will also depend on what part of the 8 x 8 image it is processing. Is you head hurting yet ?
The results are stored in memory. Another One Dimensional DCT takes the values from the memory but works on them from top to bottom to give the final Two Dimensional DCT terms.
In the next post I will show you the results from a C program I've been dabbling with that converts the 8 x 8 image values to DCT value and back to image values.
The first 3 attachments show the effect of the Discrete Cosine Transform (DCT) on 3 different samples of 8 x 8 pixel image data. Each attachment shows 3 tables. The upper table is the original 8 x 8 image data. The middle table shows the DCT terms after the transform. Note quantization has not been applied. The lower table show the results of the reverse DCT which is used to restore the image data. For the purposes of producing a compact table I have only displayed the results to one decimal place. Very small errors of 0.001 are seen when between original and restored if displayed to 3 decimal places. Which goes to show that Forward and Reverse DCT does produce faithful results if quantization is not applied. The positive terms in the DCT table are the addition of a frequency component and the negative terms are the subtraction of a frequency component. The first term is the DC component.
The fourth attachment shows the zig-zag collection of the AC terms after quantization. With small magnitudes at higher frequencies the greater chance of getting longer runs of zeros with zig-zag scanning for the run length compression.
The first attachment uses an image with a vertical grey bar. Looking at the DCT terms we can see that only horizontal frequency terms (components) are produced. No terms are seen down the table. Which follows as the vertical brightness levels in the image are constant.
The second attachment uses an image with a horizontal grey bar. Looking at the DCT terms we can see that only vertical frequency terms (components) are produced. No terms across the table. Which follows as the horizontal brightness levels in the image are constant.
It is worth noting that after the zig-zag collection of terms there is a long sequence of zero which will be ideal for good run length compression.
The third attachment uses an image with a diagonal grey bar. The DCT table shows both vertical and horizontal frequency terms. Many of the terms are small and after quantization will reduce to zero. It is also worth noting that after the zig-zag collection there are less zero terms.
So the DCT process favours images with strong vertical and horizontal elements over diagonal elements when producing figures for compression. The DCT process does not perform any compression in itself.
Good stuff Rob, I think I am beginning to understand it, at least it isn't the complete mystery it was before.
This post is to further dot the i's and cross the t's on the Quantisation stage. It is responsible for the majority of the lossy compression.
Each DCT term is divided by the corresponding position in the Quantisation table and then rounded to the nearest integer as illustrated in the first attachment. In the Quantization table the low frequency terms are in the top left hand corner and the high frequency terms are in the bottom right hand corner.
The values in the Quantisation table are also scaled (multiplied) by the Quality Factor setting ( Compression level slider in photoshop) which has the overall effect of making the Quantization level finer or coarser. From the table we can see that the size of Quantization steps varied with frequency. High frequencies are less important to human vision and therefore receive a coarser quantization. For example the low frequencies may end up with 50 different levels and highest frequency with only 3 or 4 levels.
The effect of Quantization is a bit like the Posterisation tool in Photoshop. In fact for the next attachment I have taken a typical photographic scene and applied different levels of posterisation. The upper image has 256 levels and will store in 8 bits. The middle image illustrates the effect if we used 7bits and the lower image shows the effect if we reduced it further to 6 bits. Our vision system is very tolerant to a reduction in the number of levels for this type of image. However, in the next attachment I have used the gradient tool to simulate a very low frequency image. Stepping is clearly seen as the number of levels are reduced. So for low frequencies the number of levels must to be retained.
The final stage of the JPEG process is to use the lossless Huffman compression coding to compress all of the run length compressed (another lossless compression technique) DCT terms. As mentioned in a previous post Huffman is a variable length code based on probabilities. Unlike most other codes which are fixed length block codes. For Huffman to work well there must be values in the data that occur more frequently than others. If the values are completely random then Huffman coding is pointless.
Now to run you through a typical training example that illustrates the benefits of Huffman compression.
Imagine that digital codes are used to transmit various regional weather conditions to shipping.
Let's consider there are 4 states for visibility as outlined in the table below
Visibility Code word using a 2 bit block code
Very Poor 00
There are 4 state which would require a 2 bit code
To workout the Huffman code you need to know the probabilities of each state. In our case the probabilities are as follows.
Very Poor 0.1
The most probably events gets the shortest code and the least probably events get the longest code
Visibility Code word
Very Poor 111
Using a block code it will take 2 bits to transmit the information regardless of the visibility condition.
In the case of Huffman the average code length will be
(3bits x 0.1) + ( 2bits x 0.3) + (1bit x 0.5) + (3bits x 0.1) = 1.7 binary digits per code.
An average saving of 15%
Care is needed in choosing the Huffman codes so that the string of codes are uniquely decodable. We need to know when a code starts and finishes as they are no longer of a fixed length.
110011110 = good,moderate,very poor, poor
In some situations the probabilities of events are know in advance but in the case of image data they have to calculated for each image before coding can commence. In most cases a coding tree or dictionary has to be constructed and included in the file so that compressed Huffman data can be decoded when the image file is opened on the computer. An example coding tree is attached. Because of the overhead of including the coding tree - dictionary, large size images benefit more than smaller ones.
By now I should imagine that you now know more about JPEG Baseline compression than you every wanted to know. Therefore in future posts I will focus on the practical aspects.
Thanks Rob, fascinating stuff.
I am sure many of you have wondered how Tiff compares to Best Quality Jpeg Setting. Well lets say from the outset that Best Quality setting is not normally the Best Possible setting that could be used. The Application programmer has restricted you to a setting that optimizes compression ratio against little or no perceived loss in image quality. Accordingly applications of different versions and from different vendors will have a slight variance in compression ration and image quality. It's really down to what values they decide to use in their quantization tables.
In this post I am attempting to illustrate the difference between tiff and PhotoShop setting of 12, 9 and 5 within the limitations of using Jpeg attachments.
The first attachment compares the histograms of a tiff image and saved as Jpeg versions at settings of 12 (max quality) , 9 ( high quality ) and 5 ( medium quality ). The tiff histogram shows some very distinct and well defined areas of brightness levels in the image. The Jpeg max quality shows some deviation from the original tiff. The differences are there but not huge. Basically a number of pixels have slightly change their value. As the compression levels are increased to 9 and 5 a greater variance can be seen.
The second attachment is the image in question.
Next I took the original tiff file and subtracted the Jpeg version from it to produce a difference image. As the differences were lost in the darkness I then increased the brightness by 50%. I also show the histogram for each case.
With the setting of 12 the differences are barely noticeable and you have to amplify the image to make out which pixels differ.
With the setting of 9 and 5 the differences become more noticeable.
Blimey Rob. :eek:
This has turned into a major article. I will have to read this through several times to absorb all the info. All very interesting. :) :)
Further to Post 35 on the question how does Jpeg max quality setting compare with the original Tiff. I have written a little utility program to compare each pixel in the tiff image against the jpeg version. The test image in Post 35 is 150 x 150 pixels = 22500 pixels.
The attached compares of the 3 versions of the saved as jpeg with settings of Max Quality, 9 and 5 against the original tiff. In the case of the Max Quality version some 90% of the pixels are the same as the tiff and the other 10% are within 1%.
In my last posting the utility program compared the original tiff against various saved Jpeg versions. After a small modification to the program I was able to flag the offending pixels. A blue flag for pixels that errored by less than 2%. A green flag for those that errored between 2% and 10%. A red flag for those with an error of more than 10%. Error pixels are often referred to as Jpeg artifacts. See attached.
When the Jpeg Committee were considering various lossy compression schemes they took into account which scheme gave the best results for the type of images they wanted to handle. You may have been wondering why I had designed such an odd looking test image. I wanted to contrast gentle undulations with steep steps in levels running vertically, horizontally and diagonally. If you examine the Jpeg versions you will notice that the errors congregate around the steep steps especially those that run diagonally. This falls inline with the decisions made by the committee.
The point I am trying to make is that gentle undulating data is more suitable for Jpeg compression than black and white line art images or black text on a white background. This then leads into the subject matter of whether we should shoot in raw (Tiff) or Jpeg along with what Post Processing image storage formats we should used. It is not my intention to go down this avenue as it is a topic more than adequately covered elsewhere.
Whilst I was in the mood for pixel peeping I thought I would investigate the other favourite question about Jpeg.
If I re-save a Jpeg will it degrade the image ? The question stems from those who only work in the Jpeg format when Post Processing.
If you scan forums where this question has been asked the reply will be yes it will discard information each time because of the lossy compression. But nobody has a feel as to what extent the image will degrade.
So I've taken a 760 x 760 pixel (577600 pixels) Tiff and saved it as Jpeg. I ran my program to compare the two images in the same manor as in previous Postings. I then opened and re-save the file. I repeated this operation 10 times on each occasion making no adjustment to the image. I ran the program again to compare the Tiff with the Jpeg file that was processed 10 times. I carried out this exercise at Max Quality and a Quality setting of 10. The results are below.
Compare Tiff with Jpeg 12 Max quality setting saved once
450319 Pixels are identical to the original Tiff
127267 Pixels error by less than 1 percent
14 Pixels with 1 to 2 percent error
0 Pixels with 2 to 5 percent error
0 Pixels with 5 to 10 percent error
0 Pixels with 10 to 15 percent error
0 Pixels error by more than 15 percent
Compare original Tiff with Jpeg 12 Max quality setting saved 10 times
362346 Pixels are identical to the original Tiff
209950 Pixels error by less than 1 percent
5303 Pixels with 1 to 2 percent error
1 Pixels with 2 to 5 percent error
0 Pixels with 5 to 10 percent error
0 Pixels with 10 to 15 percent error
0 Pixels error by more than 15 percent
Compare original Tiff with Jpeg 10 quality setting saved once
211658 Pixels are identical to the original Tiff
294235 Pixels error by less than 1 percent
69615 Pixels with 1 to 2 percent error
2092 Pixels with 2 to 5 percent error
0 Pixels with 5 to 10 percent error
0 Pixels with 10 to 15 percent error
0 Pixels error by more than 15 percent
Compare original Tiff with Jpeg 10 quality setting saved 10 times
209957 Pixels are identical to the original Tiff
295319 Pixels error by less than 1 percent
70189 Pixels with 1 to 2 percent error
2135 Pixels with 2 to 5 percent error
0 Pixels with 5 to 10 percent error
0 Pixels with 10 to 15 percent error
0 Pixels error by more than 15 percent
Thanks Rob I was going to ask you that very thing about multiple compressions, nice to see some numbers attached to the process I did a little experiment the other day and re compressed an image 10 times and could see no difference, your numbers tell me why.
for me and any others who are more interested in the answer to a problem than capable of following the maths in the middle, are you able to improve on my empirical blundering (obviously taking an image where there is some detail worth keeping) -
Taking an image that has to be scaled down to about 33% for forum display, I find Lanczos 3 compression has a marginal edge on the image reduced to 768H over Bicubic or Mitchell. Comparing original at 67% & reductions at 200%, Lanczos still has an edge. Comparing the original at its full size and the reductions at 300%, there is virtually nothing to choose between the compressed images.
But there is still a little to be gained on going back over the Lanczos reduced image using 'Local contrast enhancement'. LCE, being done with controlled values, seems much more successful than 'bicubic with sharpen'.
I suspect that there is then no universal answer to the last bit, ie whether in general it is best to use the more subtle sharpening techniques before or after compression? (getting a better camera, better lens and always using a tripod not being a realistic option)
Re-sampling is more than slightly off topic for this thread.
The main problem with up or down sampling is which bits do you throw away or expand. The problem is made worse if the re-sizing is not a multiple of the original. i.e your scaling by 1.4567 %. Beat patterns can occur in the image due to pixels within image information changing in position or being added or dropped. After rescaling some techniques apply a dithering to hide regular beat patterns.
I am afraid I don't have a great deal of experience of re-sampling images. A lot of my postings are 100% crops or down-sampled using bi-linear.
Maybe Nigel Blake can offer some input on this topic.
Just in case a few of you were wondering what beat patterns are about I thought I would illustrate the effect.
I made an image with alternate black and white 10 pixel wide bars. I then scaled the image by 5.2% using nearest neighbour method ( upper half of attachment) and a method that looks at it's surrounding pixels and dithers (lower half of attachment). If you look in the upper half you will notice that every so often one of the white or black bars is slightly wider. From a distance the eye will pick this up as an interference pattern. In the lower half the dithering has reduced this effect.
I have taken the trouble to tidy most of this information up and posted here (http://robertstocker.co.uk/jpeg/jpeg_new_1.htm)
Any feedback, comments or corrections gratefully received.
vBulletin® v3.7.3, Copyright ©2000-2014, Jelsoft Enterprises Ltd.