ECS110 Lecture Notes for Monday, December 4, 1995
Professor Rogaway conducting
Lecture 26
Scribe of the day: Chao Heng Liu
Today: Binary search tree
Wednesday: Splay trees
Friday: Digital search tree
Binary trees:
X
/ \
/ \
/ \
/ \
/\ /\
/ \ / \
/ L \ / R \
/______\ /______\
L: left subtree. R:right subtree.
Binary tree:
Each node has a key;
For all nodes x,
1. all nodes l in left(x), key(l) <= key(x).
2. all nodes r in right(x), key(r) >= key(x).
To search for an element with key x:
1. if x = root, then the search terminates.
2. if x < root, then left subtree is to be searched.
3. if x > root, then right subtree is to be searched.
level:
1. 20
/ \
2. 15 40
/ \ \
3. 8 18 50
/ \ /
4. 5 11 45
/
5. 9
/ \
6. 8.5 10
This Binary tree has 6 levels, 8 is at level 3 and 10 is at level 6 etc.
The in_order search of this Binary tree is: 5,8,8.5,9,10,11,15,18,20,40,45,50.
How many comparison to find an element k?
If k is in the tree, to find k takes depth(k) or level(k) comparisons,
for example, to find 9 takes 5 comparison.
If k is not in the tree, if its "would_be parant" is at level d, the number
of comparison is also d, to find 12 will be 4 comparisons, since 11 doesn't
have rightchild.
* n keys partition the space of possible keys into 2n+1 equal_classes,
amount of time a search takes depend only on the targets equal_class.
Predecessor:
To find the predecessor of a node:
-if that node has a left child: go left, then go right as far as possible.
-if that node doesn't have a left child:
march up until you find a note which is the right child of its parent,
that parent is our predcessor.
Example:
8 and 10 are the precedssor of 8.5 and 11.
Successor:
To find a successor:
-if you have a right child: go right, then go left as far as possible.
-if you don't have a right child, march up until you find a node which is
the left child of its parent, that parent is our successor.
Example:
40's successor is 45 and 10's successor is 11 and 8.5 is the successor of 8.
Deletion:
1.the node with 0 child: delete that node.
2.the node with 1 child: delete that node and move the subtree of child up.
3.the node with 2 childe:
-save the in_order successor of that node;
-delete the in_order successor;
-put the saved noded in place of x.
a)1_child node deletion:
if we want to delete 11, since 11 has one child, we simply delete 11 and
move up the subtree of it. After the deletion the tree will look like this:
1. 20
/ \
2. 15 40
/ \ \
3. 8 18 50
/ \ /
4. 5 9 45
/ \
5. 8.5 10
b)2_child node deletion:
let's delete 15, first we need to find 15's successor with the method
that we have above. 18 is what we found, then use the rule 3 of Deletion
to complete the task. After the deletion, the tree will look like this:
1. 20
/ \
2. 18 40
/ \
3. 8 50
/ \ /
4. 5 11 45
/
5. 9
/ \
6. 8.5 10
Run time:
-insert O(d)
-delete O(d)
-find O(d)
-enumerate O(n) //in_order traverse
-succ O(d)
-pred O(d)
-min O(d)
-max O(d)
Strategies for keeping the depth small:
1. Optimal BST:
Assume all elements have keys
k1 k2 k3 .............. kn
q0 ^ q1 ^ q3 ^ .............. qn
| | | ..............
p1 p2 p3 .............. pn
p1 to pn are the probabilities of successfuly finding the elements
from k1 to kn.
Suppose we have a set of keys:
|---------------------|
|keys | probabilites|
|-----|---------------|
|a | 0.22 |
|am | 0.18 |
|and | 0.20 |
|egg | 0.05 |
|if | 0.25 |
|the | 0.02 |
|two | 0.08 |
|-----|---------------|
|Total| 1.00 |
|-----|---------------|
The cost of tree is the sum of subtotal of each level, a subtotal is
the level multiply the sum of the probabilies of the elements at that level.
The cost of below tree = 1(0.05) + 2(0.18+0.02) + 3(0.22+0.20+0.25+0.08)
= 0.05 + 2(0.4) + 3(0.75)
= 2.70
egg
/ \
/ \
am the
/ \ / \
a and if two
if we use greedy stretagy, the element 'if' will be the root of tree, since
it has the hightest probabity.
if
/ \
/ \
a two
\ /
and the
/ \
am egg
The cost of above tree = 1(0.25) + 2(0.22+0.08) + 3(0.20+0.02) + 4(0.18+0.05)
= 0.25 + 0.6 + 0.66 + 0.92
= 2.43
this time the cost is cheaper, but it is not a balance tree.
however this is not working for this problem (getting the least
cost tree),
The cheapest cost of tree is this:
and
/ \
/ \
a if
\ / \
am egg two
/
the
cost = 0.20 + 2(0.22+0.25) + 3(0.18+0.05+0.08) + 4(0.02)
= 2.15
Use Dynamic programing to get the least cost tree:
Cost of a tree:
If Left > Right, then the cost of the tree is 0; this is the NULL case,
which we always have for binary search trees. Otherwise, the root cost Pi.
The left subtree has a cost of C(left,i-1), relative to its root, and the
right subtree has a cost of C(i+1,right) relative to its root. Each node
in these subtrees is one level deeper from Wi than from their respective
roots, so we must add Sum(j=Left, i-1)Pj and Sum(j=i+1, Right)Pj.
This gives the formula:
C(Left,Right) = min(left<=i<=Right) { Pi + C(Left, i-1) + C(i+1,Right) +
Sum(j=Left, i-1)Pj + Sum(j=i+1, Right)Pj }
= min(left<=i<=Right) { C(Left, i-1) + C(i+1,Right) +
Sum(j=Left, Right)Pj }
Computation of the optimal binary search tree for sample input:
Iteration: Left=1 Left=2 Left=3 Left=4 Left=5 Left=6 Left=7
|---------|---------|---------|---------|---------|---------|--------|
1.|a..a | am..am |and..and |egg..egg | if..if |the..the |two..two|
|---------|---------|---------|---------|---------|---------|--------|
|.22 a | .18 am | .20 and |.05 egg |.25 if |.02 the |.08 two |
|---------|---------|---------|---------|---------|---------|--------|
|---------|---------|---------|---------|---------|---------|--------|
2.|a..am | am..and |and..egg | egg..if | if..the | the..two|
|---------|---------|---------|---------|---------|---------|
|.58 a | .56 and | .30 and | .35 if | .29 if | .12 two |
|---------|---------|---------|---------|---------|---------|
|---------|---------|---------|---------|---------|---------|
3.|a..and | am..egg | and..if | egg..the| if..two |
|---------|---------|---------|---------|---------|
|1.02 am | .66 and | .80 if | .39 if | .47 if |
|---------|---------|---------|---------|---------|
|---------|---------|---------|---------|---------|
4.|a..egg | am..if | and..the| egg..two|
|---------|---------|---------|---------|
|1.17 am |1.21 and | .84 if | .57 if |
|---------|---------|---------|---------|
|---------|---------|---------|---------|
5.|a..if | am..the | and..two|
|---------|---------|---------|
|1.83 and |1.27 and | 1.02 if |
|---------|---------|---------|
|---------|---------|---------|
6.|a..the | am..two |
|---------|---------|
|1.89 and | 1.53 and|
|---------|---------|
|---------|---------|
7.|a..two |
|---------|
|2.15 and |
|---------|
Each element can be a root of the binary tree, their pictures and cost are
the following:
The cost of root is 1, and the cost of the subtrees can look easily from
the table above.
Example:
the subtree of tree 2. are 'a' and 'and..two', and the cost of 'a' can be
found at Iteration 1 and Left=1; the cost of 'and..two' can be found at
Iteration 5 and Left 3; their cost are 0.22 and 1.02.
1. a 2. am
\ / \
/\ /\ \
/ \ /a \ \
/ \ ------ /\
/ \ / \
/am..two \ / \
----------- / and..two \
cost = 1 + 1.53 --------------
= 2.53 cost = 1 + 0.22 + 1.02
= 2.24
3. and
/ \ 4. egg
/ \ / \
/ \ \ / \
/ \ \ / \ \
/a..am \ / \ / \ / \
------------- / \ / a..and \ / \
/egg..two \ ------------/ if..two \
----------- -----------
cost = 1 + 0.58 + 0.57 cost = 1 + 1.02 + 0.47
= 2.15 =2.49
6. the
5. if / \
/ \ / \
/ \ / \ / \
/ \ \ / a..if \ / \
/ \ / \ ---------/two..two\
/a..egg \ / \ -----------
-----------/if..two \ cost = 1 + 1.83 + 0.08
----------- = 2.91
cost = 1 + 1.17 + 0.47
= 2.64
7. two
\
\
/ \
/ \
/ a..the \
----------- cost = 1 + 1.89
= 2.89
Code for Cost function:
int Cost(int left, int right)
{
if left > right
return 0;
min = negative_infinity; p = 0; //add memoization here
for i = left to right
do p += Pi;
for i = left to right do
if Cost(left,i-1) + Cost(i+1,right) < min
min = m;
}
with memoization
Cost[l, r] called for only O(n^2) values, each called for O(n) time
Therefore O(n^3) for the algorithm, this can be improved to O(n^2).