Files for the Huffman Project
This page was completed
Sun Oct 28 19:36:30 PDT 2018
We could use the alphabet of all symbols that might appear in a text file.
How many could there be? I really don't know.
Instead, I decided that we would use only 27 symbols, the capital Roman letters
A through Z and space. When you read a file, you should convert all lower case
letters to upper case, and convert every other symbol to a space. The end of line
symbol should also be converted to a space. We will assume that the expected
frequency of each of those symbols in English text is
proportional to the integers given in the file
english.
My program can be used to either encode or decode a file. Each time it runs,
it recomputes the optimal prefix code. I am convinced that it is not worthwhile
to store the code in a file.
While debugging my program, I made extensive use of the text file
simple. The encoded version is
simplecode, while the decoded
version of that file is simpledecode.
Here is a longer message: early..
The coded version is earlycode
and the decoded version is
earlydecode
Here is a longer example, consisting of two sentences that we endlessly typed
during typing class many years ago, possibly before your parents were born.
typingclass,
typingclasscode,
typingclassdecode.
Note that, in all these example, the code file consists of lines of length exactly
80, except for the last line. On the other hand, the text file created by
the decoder consists of lines of length no more than 80, and has the additional
feature that a word is never broken.
Your assignment is to
-
write the code for Huffman's algorithm,
-
decode the file Walter Scott,
-
encode the file First Amendment,
-
decode the file Hamlet.
The decoded file is rather ugly, since there are many special marks
that are replaced by spaces.