Files for the Huffman Project

This page was completed `Sun Oct 28 19:36:30 PDT 2018`

We could use the alphabet of all symbols that might appear in a text file. How many could there be? I really don't know.

Instead, I decided that we would use only 27 symbols, the capital Roman letters A through Z and space. When you read a file, you should convert all lower case letters to upper case, and convert every other symbol to a space. The end of line symbol should also be converted to a space. We will assume that the expected frequency of each of those symbols in English text is proportional to the integers given in the file english.

My program can be used to either encode or decode a file. Each time it runs, it recomputes the optimal prefix code. I am convinced that it is not worthwhile to store the code in a file.

While debugging my program, I made extensive use of the text file simple. The encoded version is simplecode, while the decoded version of that file is simpledecode.

Here is a longer message: early.. The coded version is earlycode and the decoded version is earlydecode

Here is a longer example, consisting of two sentences that we endlessly typed during typing class many years ago, possibly before your parents were born. typingclass, typingclasscode, typingclassdecode. Note that, in all these example, the code file consists of lines of length exactly 80, except for the last line. On the other hand, the text file created by the decoder consists of lines of length no more than 80, and has the additional feature that a word is never broken. Your assignment is to

write the code for Huffman's algorithm,
decode the file Walter Scott,
encode the file First Amendment,
decode the file Hamlet. The decoded file is rather ugly, since there are many special marks that are replaced by spaces.

Files for the Huffman Project

This page was completed Sun Oct 28 19:36:30 PDT 2018

This page was completed `Sun Oct 28 19:36:30 PDT 2018`