Real Computer Science begins where we almost stop reading ...: Variable length encoding & compression

Tuesday, 11 March 2014

Variable length encoding & compression

Problem : Variable length encoding & compression

A. Huffman and other coding schemes tend to devote more bits to the coding of

(A) symbols carrying the most information
(B) symbols carrying the least information
(C) symbols that are likely to be repeated consecutively
(D) symbols containing redundant information

Answer:

(A) symbols carrying the most information, i.e., the symbols that are less likely to occur. This makes sense: to keep messages as short as possible, frequently occuring symbols should be encoded with fewer bits and infrequent symbols with more bits.

B. Consider the following two Huffman decoding tress for a variable-length code involving 5 symbols: A, B, C, D and E.

Using Tree #1, decode the following encoded message: "01000111101".

Answer:

To decode the message, start at the root of the tree and consume digits as you traverse down the tree, stopping when you reach a leaf node. Repeat until all the digits have been processed. Processing the encoded message from left-to-right:

"0" => A
"100" => B
"0" => A
"111" => E
"101" => C

C. Suppose we were encoding messages that the following probabilities for each of the 5 symbols:

p(A) = 0.5
p(B) = p(C) = p(D) = p(E) = 0.125

Which of the two encodings above (Tree #1 or Tree #2) would yield the shortest encoded messages averaged over many messages?

Answer:

Using Tree #1, the expected length of the encoding for one symbol is:

1*p(A) + 3*p(B) + 3*p(C) + 3*p(D) + 3*p(E) = 2.0

Using Tree #2, the expected length of the encoding for one symbol is:

2*p(A) + 2*p(B) + 2*p(C) + 3*p(D) + 3*p(E) = 2.25

So using the encoding represented by Tree #1 would yield shorter messages on the average.

D. Using the probabilities for A, B, C, D and E given above, construct a variable-length binary decoding tree using a simple greedy algorithm as follows:

1. Begin with the set S of symbols to be encoded as binary strings, together with the probability P(x) for each symbol x. The probabilities sum to 1, and measure the frequencies with which each symbol appears in the input stream. In the example from lecture, the initial set S contains the four symbols and associated probabilities in the above table.

2. Repeat the following steps until there is only 1 symbol left in S:

A. Choose the two members of S having lowest probabilities. Choose arbitrarily to resolve ties. In the example aove, D and E might be the first nodes chosen.

B. Remove the selected symbols from S, and create a new node of the decoding tree whose children (sub-nodes) are the symbols you've removed. Label the left branch with a "0", and the right branch with a "1". In the first iteration of the example above, the bottom-most internal node (leading to D and E) would be created.

C. Add to S a new symbol (e.g., "DE" in our example) that represents this new node. Assign this new symbol a probability equal to the sum of the probabilities of the two nodes it replaces.

Answer:

S = {A/0.5 B/0.125 C/0.125 D/0.125 E/0.125}

arbitrarily choose D & E

encoding: "0" => D, "1" => E

S = {A/0.5 B/0.125 C/0.125 DE/0.25}

choose B & C

encoding: "0" => B, "1" => C

S = {A/0.5 BC/0.25 DE/0.25}

choose BC & DE

encoding: "00" => B, "01" => C, "10" => D, "11" => E

S = {A/0.5 BCDE/0.5}

choose A & BCDE

encoding: "0" => A, "100" => B, "101" => C, "110" => D, "111" => E

S = {ABCDE/1.0}

This Tree #1 shown in the diagram above. The choice of D & E as the first symbols to combine was arbitrary -- we could have chosen any two symbols from B, C, D and E. So there are many equally plausible encodings that might emerge from this algorithm, corresponding the interchanging B, C, D and E at the leaves of the tree.

E. Huffman coding is used to compactly encode the species of fish tagged by a game warden. If 50% of the fish are bass and the rest are evenly distributed among 15 other species, how many bits would be used to encode the species of a bass?

Answer:

1 bit, using the algorithm described above.

F. Consider the sum of two six-sided dice. Even when the dice are "fair" the amount information conveyed by a single sum depends on what the sum is since some sums are more likely than others, as shown in the following figure:

What is the average number of bits of information provided by the sum of 2 dice? Suppose we want to transmit the sums resulting from rolling the dice 1000 times. How many bits should we expect that transmission to take?

Answer:

Average number of bits = sum p_ilog₂(1/p_i) for i = 2 through 12. Using the probabilities given in the figure above the average number of bits of information provided by the sum of two dice is 3.2744.

So if we had the perfect encoding, the expected length of the transmission would be 3274.4 bits. If we encode each sum separately we can't quite achieve this lower bound -- see the next question for details.

G. Suppose we want to transmit the sums resulting from rolling the dice 1000 times. If we use 4 bits to encode each sum, we'll need 4000 bits to transmit the result of 1000 rolls. If we use a variable-length binary code which uses shorter sequences to encode more likely sums then the expected number of bits need to encode 1000 sums should be less than 4000. Construct a variable-length encoding for the sum of two dice whose expected number of bits per sum is less than 3.5. (Hint: It's possible to find an encoding for the sum of two dice with an expected number of bits = 3.306.)

Answer:

Using the greedy algorithm given above, we arrive at the following encoding which has 3.3056 as the expected number of bits for each sum.

H. Okay, so can we make an encoding for transmitting 1000 sums that has an expected length smaller than 3306 bits?

Answer:

Yes, but we have to look at encoding more than one sum at a time, e.g., by applying the construction algorithm to pairs of sums, or ultimately to all 1000 sums at once. Many of the more sophisticated compression algorithms consider sequences of symbols when constructing the appropriate encoding scheme.

No comments:

Inspiring Quotes

An inspiring quote may be just what you need to turn your day around. Here are some of the most inspiring quotes ever spoken or written.

I hated every minute of training, but I said, “Don’t quit. Suffer now and live the rest of your life as a champion.”

–Muhammad Ali

“You can have anything you want if you are willing to give up the belief that you can’t have it.”
–Robert Anthony

“There is no man living that can not do more than he thinks he can.”

–Henry Ford

“The best way to predict the future is to create it.”

–Dr. Forrest C. Shaklee

“It’s not about time, it’s about choices. How are you spending your choices?”

–Beverly Adamo

“Success…seems to be connected with action. Successful people keep moving. They make mistakes, but they don’t quit.”
–Conrad Hilton

“Destiny is not a matter of chance; it’s a matter of choice.”

–Anonymous

“The future belongs to those who believe in the beauty of their dreams.”
–Eleanor Roosevelt

“The quality of a person’s life is in direct proportion to their commitment to excellence, regardless of their chosen field of endeavor.”
–Vince Lombardi

“It is never too late to be what you might have been.”
–George Eliot

“Do not let what you can not do; interfere with what you can do.”
–John Wooden

“One man with courage makes a majority.”
–Andrew Jackson

“Failure is the opportunity to begin again more intelligently.”
–Henry Ford

“Try not to become a man of success but rather try to become a man of value.”
–Albert Einstein

“The mind is its own place, and in itself can make a heaven of Hell, a hell of Heaven.”

–John Milton

"If u are student, working and preparing give a little extra effort after regular work. A small sacrifice of TV time, fun time, or facebook time can bring a lot of better things to life than you ever imagined."

-- Naam likhna jaroori nai samajhta.

Thank you for reading, be sure to pass this along!

Tuesday, 11 March 2014

Variable length encoding & compression

No comments:

Post a Comment