ECS 60 Homework 4: Operation Miraq Victory

Announcement. May 19, 00:35. I have fixed a subtle bug in my reference program. Please download the latest version.

Due: Friday, May 29 at 2200 hours.

Hand out: my reference program

Hand in: Makefile and all the necessary program files. When I type "make" with no command line argument, your Makefile should create an executable named huffman. Your program should compile on CSIF Linux machines and have the same input/output behavior as my reference program.

Note: This is a group homework. You should work in groups of two. Only one person from each group shall hand in all the files.
On the first two lines of Makefile, write each team member's email username at ucdavis.edu, last name, and first name (one person per line) as comments. For example:
# bsimpson; Simpson, Bart
# mburns; Burns, Montgomery

If you fail to follow this specification, you will lose points.

Description

In the last episode of the Miraq trilogy, the Mamerican forces invaded liberated Miraq in pursuit of oil freedom. However, terrorist attacks spread across Miraq like wildfire in Santa Barbara. You, as the Commanding General of the Mamerican forces and their Miraqi puppets allies, just discovered the book, which describes efficient algorithms for terminating insurgents, that would ensure victory in Miraq. You wish to send this book to all the troops via the Internet ASAP. However, due to the astronomical deficit of the Mamerican government, you wish to compress the book to reduce your ISP charges.

Use Huffman coding for compression/decompression.

You may NOT use STL classes except the string class.

Command line: Your program accepts an optional command line argument "-d":

Read input from cin and write output to cout.

Uncompressed data: The uncompressed data contains a sequence of 8-bit characters. The input contains at most 232-1 characters.

Compressed data: The compressed data contains three sections:

  1. Magic cookie. This section contains 8 characters: the string "HUFFMAN" followed by the ASCII 0 character (\0).
  2. Frequencies. This section contains the frequencies of all the characters from ASCII 0 to ASCII 255, even if a characer is absent from the uncompressed data. The frequency of a character is its count in the uncompressed data. Order the frequencies by the ASCII values of their corresponding characters. Each frequency is represented by a 4-byte unsigned integer in the compressed data. Do NOT print the frequency of the dummy character (since it is always 0).
  3. Compressed data. This section contains the codes of all the characters in the same order as they appear in the uncompressed data. Additionally, append the code of the dummy character to the end of the uncompressed data. Since this section contains a sequence of bits but the smallest unit of data is a byte in files, you need to convert bits into bytes by the following rules:

We will test the decompression function of your program with only valid compressed data, so your program need not handle errors in the compressed data.

Extras