## Tuesday, 11 March 2014

### Variable-length encoding

Problem : Variable-length encoding
After spending the afternoon in the dentist's chair, Ben Bitdiddle has invented a new language called DDS made up entirely of vowels (the only sounds he could make with someone's hand in his mouth). The DDS alphabet consists of the five letters "A", "E", "I", "O", and "U" which occur in messages with the following probabilities:
 Letter Probability of occurrence A p(A) = 0.15 E p(E) = 0.4 I p(I) = 0.15 O p(O) = 0.15 U p(U) = 0.15
A.   If you are told that the first letter of a message is "A", give an expression for the number of bits of information have you received.
Using the formula given in lecture, the number of bits of information is log2(1/p(A)) = log2(1/0.15)
B.    Ben is trying to invent a fixed-length binary encoding for DDS that permits detection and correction of single bit errors. Briefly describe the constraints on Ben's choice of encodings for each letter that will ensure that single-bit error detection and correction is possible. (Hint: think about Hamming distance.)
Each encoding must differ from other encodings in at least 3 bit positions, i.e., encodings must have a Hamming distance >= 3. This ensures that each received codeword (even those with single-bit errors) can be associated with a particular source encoding.
C.    Giving up on error detection and correction, Ben turns his attention to transmitting DDS messages using as few bits as possible. Assume that each letter will be separately encoded for transmission. Help him out by creating a variable-length encoding that minimizes the average number of bits transmitted for each letter of the message.
Using the simple "greedy" algorithm described above:
S = {A/0.15 E/0.4 I/0.15 O/0.15 U/0.15}
arbitrarily choose O & U
encoding: "0" => O, "1" => U
S = {A/0.15 E/0.4 I/0.15 OU/0.3}
choose A & I
encoding: "0" => A, "1" => I
S = {AI/0.3 E/0.4 OU/0.3}
choose AI & OU
encoding: "00" => A, "01" => I, "10" => O, "11" => U
S = {AIOU/0.6 E/0.4}
choose E & AIOU
encoding: "0" => E, "100" => A, "101" => I, "110" => O, "111" => U
S = {AEIOU/1.0}
Note that the assignments of symbols to encodings "0" and "1" were arbitrary and could have been swapped at each level. So, for example, swapping the encoding at the last step would have resulted in
encoding: "1" => E, "000" => A, "001" => I, "010" => O, "011" => U
which achieves the same average bits/symbol as the previous encoding.