------------------------------------------------------------------------------ COMP 731 - Data Struc & Algorithms - Lecture 31 - Tuesday, September 19, 2000 ------------------------------------------------------------------------------ Today: o Huffman code o LZ77 algorithm o How ZIP and gzip work Explain Huffman's algorithm and how to implement it with a heap. Give examples, showing the amount of compression you get with different encodings of a long file. Explain the LZ77 algorithm and give examples. Make sure that some of these examples show the case in which the string that is being copied creates the characters it will subsequently be using. Talk about how to implement LZ efficiently, using hashing. Explain the basics of the ZIP algorithm 1. Break the string into 32 KByte chunks. Separately compress each. 2. Use LZ77 on each chunk, representing literal 0******* -------- * = unused bit position 1------- -------- - = information bit length -------- Runs of longer than 256 bytes are not used -- stop at 256. (Probably 259, actually; lengths of 0 and 1, and maybe 2, are not used, so we can steal those values for other values). 3. Huffman encode each chunk, using one tree (one code) for the literals and positions, and one tree for the lengths. There will never be ambiguity in decoding because a length always follows the position.