Huffman coding for Lossless Compression
I really need help with Huffman Coding for Lossless compression. I have an exam coming up and need to understand this, does anyone know of easy tutorials made to understand this, or could someone explain.
The questions in the exam are likely to be:
Suppose the alphabet is [A, B, C], and the known probability distribution is P(A)=0.6, P(B)=0.2 and P(C)=0.2. For simplicity, let’s also assume that both encoder and decoder know that the length of the messages is always 3, so there is no need for a terminator.
How many bits are needed to encode the message ACB by Huffman Coding? You need to provide the Huffman tree and Huffman code for each symbol. (3 marks)
How many bits are needed to encode the message ACB by Arithmetic Coding? You need to provide details of the encoding process. (3 marks)
Using the above results, discuss the advantage of Arithmetic Coding over Huffman coding. (1 mark)
Answers:
Huffman Code: A - 1, B - 01, C - 00. The encoding result is 10001, so 5 bits are needed. (3 marks)
The encoding process of Arithmetic Coding: Symbol Low high range 0.0 1.0 1.0 A 0.0 0.6 0.6 C 0.48 0.6 0.12 B 0.552 0.576 0.024 The final binary codeword is 0.1001, which is 0.5625. Therefore 4 bits are needed. (3 marks)
In Huffman Coding, the length of the co开发者_JAVA技巧deword for each symbol has to be an integer. But it can be fractional in Arithmetic Coding. Therefore Arithmetic Coding is often more efficient than Huffman Coding, as the results shown above. (1 mark)
http://en.wikipedia.org/wiki/Huffman_coding
If you look at the tree (top right) you'll see that each parent node is the sum of the two below it. The values at the nodes are the frequencies of the letters. Each bit in the binary sequence is a right/left branch in the tree.
Does that help?
I don't really have a clue about Arithmetic coding, but it looks quite clever.
A Huffman tree is a binary tree with the nodes representing the values with the highest distribution in the stream being compressed near the root and the values with decreasing distribution further and further away from the root, thus allowing more common values to be encoded in shorter bit strings while less common values are encoded in longer strings.
A Huffman tree is constructed as follows:
- Build a table of entities in the source stream, with their distribution.
- Pick the two entries in the table that have the lowest distribution.
- Make a tree node out of these two entries.
- Remove the entries just used from the table.
- Add a new entry to the table with the combined distribution of the nodes just removed, as well as the tree node.
- if there is more than one entry left in the table, go to step 2.
- The entry left in the table is your root.
Basic huffmann implementation can be quite ok. But, if you are building from scratch you may need more than 1 other datastructure in your toolbox to make things easier such as a minHeap and a bit vector. The basic algorithms for encoding and decoding are pretty simple. No info on comparison with arithmetic coding.
An implementation example
精彩评论