Problem : Variablelength encoding
After spending the afternoon in the dentist's chair, Ben Bitdiddle
has invented a new language called DDS made up entirely of vowels (the only
sounds he could make with someone's hand in his mouth). The DDS alphabet
consists of the five letters "A", "E", "I",
"O", and "U" which occur in messages with the following
probabilities:
Letter

Probability
of occurrence

A

p(A) = 0.15

E

p(E) = 0.4

I

p(I) = 0.15

O

p(O) = 0.15

U

p(U) = 0.15

A.
If you are told that the first letter of a message is
"A", give an expression for the number of bits of information have
you received.
Using the formula given in lecture, the number of
bits of information is log_{2}(1/p(A)) = log_{2}(1/0.15)
B.
Ben is trying to invent a fixedlength binary encoding for DDS
that permits detection and correction of single bit errors. Briefly describe
the constraints on Ben's choice of encodings for each letter that will ensure
that singlebit error detection and correction is possible. (Hint: think about
Hamming distance.)
Each encoding must differ from other encodings in
at least 3 bit positions, i.e., encodings must have a Hamming distance >= 3.
This ensures that each received codeword (even those with singlebit errors)
can be associated with a particular source encoding.
C.
Giving up on error detection and correction, Ben turns his
attention to transmitting DDS messages using as few bits as possible. Assume
that each letter will be separately encoded for transmission. Help him out by
creating a variablelength encoding that minimizes the average number of bits
transmitted for each letter of the message.
Using the simple "greedy" algorithm
described above:
S = {A/0.15 E/0.4 I/0.15 O/0.15 U/0.15}
arbitrarily choose O &
U
encoding: "0"
=> O, "1" => U
S = {A/0.15 E/0.4 I/0.15 OU/0.3}
choose A & I
encoding: "0"
=> A, "1" => I
S = {AI/0.3 E/0.4 OU/0.3}
choose AI & OU
encoding: "00"
=> A, "01" => I, "10" => O, "11" => U
S = {AIOU/0.6 E/0.4}
choose E & AIOU
encoding: "0"
=> E, "100" => A, "101" => I, "110"
=> O, "111" => U
S = {AEIOU/1.0}
Note that the assignments of symbols to encodings
"0" and "1" were arbitrary and could have been swapped at
each level. So, for example, swapping the encoding at the last step would have
resulted in
encoding: "1"
=> E, "000" => A, "001" => I, "010"
=> O, "011" => U
which achieves the same average bits/symbol as
the previous encoding.