1 - O
/ \
3A - O 2 - O
/ \ / \
000010 001100 / \
/ 3B - O
/ / \
5 - O 110111 111000
/ \
100001 100011
Example find on 100001
- At 1 - O, we look at 1st bit. It is a 1, so go right to 2 - O.
- At 2 - O, we look at 2nd bit. It is a 0, so go left to 5 - O.
- At 5 - O, we look at 5th bit. It is a 0, so go left to 100001 and
compare. They are equal so 100001 is in our trie.
Patricia
What if we took our binary trie and only used one type of node? This
node would contain the 2 pointers to it's children and the data. This
is the Patricia (No, I'm not kidding. This specialized trie is really
called a Patricia).
Because the number of leaf nodes is at most one more than the number of
internal nodes, we need to create an extra node at the zero level to
guarantee that all our data can be put in the Patricia.
The convention we use here is if a node does not have a left child,
then it's left pointer points back to itself. If it doesn't have a
right child then the right pointer points to it's parent or the zero
level node.
We search through a Patricia in the same manner as before, where the
bit determines the direction taken. The catch is, if we see that the
bit level stays the same or decreases then we do our comparison.
![[Patricia]](pat.gif)
Example find for 100001:
- We skip the zero level node and start at the 1st level node that
contains 001100.
- We are at a 1st level node. We check the 1st bit of 100001 which is
a 1, and follow the right pointer to 2 - 100011. The bit level has
increased so we do not compare.
- We are at a 2nd level node. We check the 2nd bit of 100001 which is
a 0, and follow the left pointer to 5 - 100001. The bit level has
increased so we do not compare.
- We are at a 5th level node. We check the 5th bit of 100001 which is
a 0, and follow the left pointer to 5 - 100001. The bit level is the
same as before, so we compare 100001 and 100001. They are the same so
100001 is in our Patricia.
Example find for 111000:
- We skip the zero level node and start at the 1st level node that
contains 001100.
- We are at a 1st level node. We check the 1st bit of 111000 which is
a 1, and follow the right pointer to 2 - 100011. The bit level has
increased so we do not compare.
- We are at a 2nd level node. We check the 2nd bit of 111000 which is
a 1, and follow the right pointer to 3 - 110111. The bit level has
increased so we do not compare.
- We are at a 3rd level node. We check the 3rd bit of 111000 which is
a 1, and follow the right pointer to 0 - 111000. The bit level
decreased, so we compare 111000 and 111000. They are the same so 111000
is in our Patricia.
Variant on DST's
We could have more branches per node so we end up with a fatter tree of
very shallow depth. Professor Rogaway mentions possible DSTs with 1000
branches. No, I am not giving an example of this. Sorry ;->
Application of DST: File compression using Huffman encoding
Suppose we have a file with lots of redundancy and want to compress
it.
We have a 1MB file whose entries are the ASCII letters a, b, c, d, e,
and f with the following frequencies:
a | 450000
b | 130000
c | 120000
d | 160000
e | 90000
f | 50000
If each of these letters is represented with 8 bits, then we end up
with a 1MB file.
Now because we only have 6 entries, we could more efficiently represent
each entry with a 3 bit code.
entry 3-bit code
a 000
b 001
c 010
d 011
e 100
f 101
Because we are using only 3-bits , we end up with a file that is 3/8
the size of the original 8-bit file. About 375KB.
Now what if we try something else and represent the entries in the
following manner.
entry code
a 0 Notice that each code is not a prefix of another.
b 101 For example, no other code begins with a's code of 0.
c 100 Nor does any other code begin with c's code of 100.
d 111
e 1101
f 1100
This is Huffman encoding.
Now to decode 0101111110100, we just parse the string into the
different codes and output the entry. Because the codes are not
prefixes of other codes, when we make a match, we are guaranteed that
the matched code is the correct one.
Here is how the above bit string is decoded.
code 0 101 111 1101 0 0
output a b d e a a
This results in a file size of 280KB.
Here is the corresponding Huffman DST.
/\
0 / \ 1
/ \ The 0's and 1's next to the branches
a /\ indicate the path to follow depending upon
/ \ the bit. 0, go left. 1, go right.
0 / \ 1
/ \
/\ /\
0/ \1 0/ \ 1
/ b / \
c / d
/\
0/ \ 1
f e
How to build a Huffman tree
We build a Huffman tree by being greedy. Here are the frequencies again
in ten thousands.
a: 45 b: 13 c: 12 d: 16 e: 9 f: 5
We start building the tree by grouping the two least likely items to
occur. We then add the frequencies to come up with a combined frequency
for the combined entries.
We first see that e (9) and f (5) are the two lowest frequencies. We
combine them and come up with a combined frequency of 9+5=14.
a: 45 b: 13 c: 12 d: 16 14
/ \
f e
Now b (13) and c (12) have the two lowest frequencies. We combine them
and come up with a combined frequency of 13+12=25.
a: 45 25 d: 16 14
/ \ / \
c b f e
Now d (16) and the f-e group (14) have the two lowest frequencies. We
combine them and come up with a combined frequency of 16+14=30.
a: 45 25 30
/ \ / \
c b / d
/\
/ \
f e
Now the c-b group (25) and the d-f-e group (30) have the two lowest
frequencies. We combine them and come up with a combined frequency of
25+30=55.
a: 45 55
/ \
/ \
/ \
/\ /\
/ \ / \
c b / d
/\
/ \
f e
Now we combine the remaining groups and end up with our final Huffman
tree.
/\
/ \
a /\
/ \
/ \
/ \
/\ /\
/ \ / \
c b / d
/\
/ \
f e
Conclusion
Lessons we learned in this course.
1. Programming is a thoughtful endeavor.
"If you're used to programming by reading the assignment sheet and
going to the computer and typing it in, then you probably haven't
gotten most of the programs. Maybe it worked on the first one, but my
guess is that it didn't work on 2, 3, and 4. And if you only solved
program 1 and managed to get working solutions for none of 2, 3, and 4.
A root cause might be that you never came around to the opinion that in
order to get real programs to work, you can't just sit down and write
them. You really have to think about how you're going to attack this
problem. Give it some thought."
2. Data Structures and Algorithms are intertwined.
"This is a course in data structures, right? But I feel when I'm
teaching it that I'm spending a third of my time discussing algorithms.
I don't know how to avoid that. I think that data structures and
algorithms go hand in hand. The interesting thing about data structures
is the algorithms you run on them. The interesting thing about
algorithms is that you can in fact make them run by intertwining them
with data structures."
3. Think Abstractly.
"Another possible reason you've not succeeded in programming
assignments is that somehow you never really got the point of building
these abstraction boundaries and really believing in them. Some people
make the abstraction boundaries. They implement the priority queue as a
binary heap, but they never really put it out of their head that the
way to then think of that data structure is via the operations which
are acting on it. Insert. Delete. Instead, every time they see that
insert, in their head, somehow they're translating it into what
operations are taking place in the underlying data structure. You
probably know if you do this. Every time you do this, you have to sort
of hit yourself and say "You're not abstracting properly." The failure
to make an abstraction boundary at this data structure level means
ultimately that you can't see the problem clearly enough to solve it
when the problems become abstract. Ultimately you have to cut away the
details, like how your priority queue is implemented and see it as this
abstract thing on which you can do use this set of operations. So here
we've seen abstract data types used throughout, so really you need to
think of your abstract data types as abstract data types and treat this
encapsulation seriously"
If you find any errors, or have a question about the posted notes
included on this page please feel free to contact me via email. I can
be reached at