ECS110 Lecture Notes for Monday, December 4, 1995

Professor Rogaway conducting

Lecture 26

Scribe of the day: Chao Heng Liu




Today: Binary search tree
Wednesday: Splay trees
Friday: Digital search tree


Binary trees: X / \ / \ / \ / \ /\ /\ / \ / \ / L \ / R \ /______\ /______\ L: left subtree. R:right subtree. Binary tree: Each node has a key; For all nodes x, 1. all nodes l in left(x), key(l) <= key(x). 2. all nodes r in right(x), key(r) >= key(x). To search for an element with key x: 1. if x = root, then the search terminates. 2. if x < root, then left subtree is to be searched. 3. if x > root, then right subtree is to be searched. level: 1. 20 / \ 2. 15 40 / \ \ 3. 8 18 50 / \ / 4. 5 11 45 / 5. 9 / \ 6. 8.5 10 This Binary tree has 6 levels, 8 is at level 3 and 10 is at level 6 etc. The in_order search of this Binary tree is: 5,8,8.5,9,10,11,15,18,20,40,45,50. How many comparison to find an element k? If k is in the tree, to find k takes depth(k) or level(k) comparisons, for example, to find 9 takes 5 comparison. If k is not in the tree, if its "would_be parant" is at level d, the number of comparison is also d, to find 12 will be 4 comparisons, since 11 doesn't have rightchild. * n keys partition the space of possible keys into 2n+1 equal_classes, amount of time a search takes depend only on the targets equal_class. Predecessor: To find the predecessor of a node: -if that node has a left child: go left, then go right as far as possible. -if that node doesn't have a left child: march up until you find a note which is the right child of its parent, that parent is our predcessor. Example: 8 and 10 are the precedssor of 8.5 and 11. Successor: To find a successor: -if you have a right child: go right, then go left as far as possible. -if you don't have a right child, march up until you find a node which is the left child of its parent, that parent is our successor. Example: 40's successor is 45 and 10's successor is 11 and 8.5 is the successor of 8. Deletion: 1.the node with 0 child: delete that node. 2.the node with 1 child: delete that node and move the subtree of child up. 3.the node with 2 childe: -save the in_order successor of that node; -delete the in_order successor; -put the saved noded in place of x. a)1_child node deletion: if we want to delete 11, since 11 has one child, we simply delete 11 and move up the subtree of it. After the deletion the tree will look like this: 1. 20 / \ 2. 15 40 / \ \ 3. 8 18 50 / \ / 4. 5 9 45 / \ 5. 8.5 10 b)2_child node deletion: let's delete 15, first we need to find 15's successor with the method that we have above. 18 is what we found, then use the rule 3 of Deletion to complete the task. After the deletion, the tree will look like this: 1. 20 / \ 2. 18 40 / \ 3. 8 50 / \ / 4. 5 11 45 / 5. 9 / \ 6. 8.5 10 Run time: -insert O(d) -delete O(d) -find O(d) -enumerate O(n) //in_order traverse -succ O(d) -pred O(d) -min O(d) -max O(d) Strategies for keeping the depth small: 1. Optimal BST: Assume all elements have keys k1 k2 k3 .............. kn q0 ^ q1 ^ q3 ^ .............. qn | | | .............. p1 p2 p3 .............. pn p1 to pn are the probabilities of successfuly finding the elements from k1 to kn. Suppose we have a set of keys: |---------------------| |keys | probabilites| |-----|---------------| |a | 0.22 | |am | 0.18 | |and | 0.20 | |egg | 0.05 | |if | 0.25 | |the | 0.02 | |two | 0.08 | |-----|---------------| |Total| 1.00 | |-----|---------------| The cost of tree is the sum of subtotal of each level, a subtotal is the level multiply the sum of the probabilies of the elements at that level. The cost of below tree = 1(0.05) + 2(0.18+0.02) + 3(0.22+0.20+0.25+0.08) = 0.05 + 2(0.4) + 3(0.75) = 2.70 egg / \ / \ am the / \ / \ a and if two if we use greedy stretagy, the element 'if' will be the root of tree, since it has the hightest probabity. if / \ / \ a two \ / and the / \ am egg The cost of above tree = 1(0.25) + 2(0.22+0.08) + 3(0.20+0.02) + 4(0.18+0.05) = 0.25 + 0.6 + 0.66 + 0.92 = 2.43 this time the cost is cheaper, but it is not a balance tree. however this is not working for this problem (getting the least cost tree), The cheapest cost of tree is this: and / \ / \ a if \ / \ am egg two / the cost = 0.20 + 2(0.22+0.25) + 3(0.18+0.05+0.08) + 4(0.02) = 2.15 Use Dynamic programing to get the least cost tree: Cost of a tree: If Left > Right, then the cost of the tree is 0; this is the NULL case, which we always have for binary search trees. Otherwise, the root cost Pi. The left subtree has a cost of C(left,i-1), relative to its root, and the right subtree has a cost of C(i+1,right) relative to its root. Each node in these subtrees is one level deeper from Wi than from their respective roots, so we must add Sum(j=Left, i-1)Pj and Sum(j=i+1, Right)Pj. This gives the formula: C(Left,Right) = min(left<=i<=Right) { Pi + C(Left, i-1) + C(i+1,Right) + Sum(j=Left, i-1)Pj + Sum(j=i+1, Right)Pj } = min(left<=i<=Right) { C(Left, i-1) + C(i+1,Right) + Sum(j=Left, Right)Pj } Computation of the optimal binary search tree for sample input: Iteration: Left=1 Left=2 Left=3 Left=4 Left=5 Left=6 Left=7 |---------|---------|---------|---------|---------|---------|--------| 1.|a..a | am..am |and..and |egg..egg | if..if |the..the |two..two| |---------|---------|---------|---------|---------|---------|--------| |.22 a | .18 am | .20 and |.05 egg |.25 if |.02 the |.08 two | |---------|---------|---------|---------|---------|---------|--------| |---------|---------|---------|---------|---------|---------|--------| 2.|a..am | am..and |and..egg | egg..if | if..the | the..two| |---------|---------|---------|---------|---------|---------| |.58 a | .56 and | .30 and | .35 if | .29 if | .12 two | |---------|---------|---------|---------|---------|---------| |---------|---------|---------|---------|---------|---------| 3.|a..and | am..egg | and..if | egg..the| if..two | |---------|---------|---------|---------|---------| |1.02 am | .66 and | .80 if | .39 if | .47 if | |---------|---------|---------|---------|---------| |---------|---------|---------|---------|---------| 4.|a..egg | am..if | and..the| egg..two| |---------|---------|---------|---------| |1.17 am |1.21 and | .84 if | .57 if | |---------|---------|---------|---------| |---------|---------|---------|---------| 5.|a..if | am..the | and..two| |---------|---------|---------| |1.83 and |1.27 and | 1.02 if | |---------|---------|---------| |---------|---------|---------| 6.|a..the | am..two | |---------|---------| |1.89 and | 1.53 and| |---------|---------| |---------|---------| 7.|a..two | |---------| |2.15 and | |---------| Each element can be a root of the binary tree, their pictures and cost are the following: The cost of root is 1, and the cost of the subtrees can look easily from the table above. Example: the subtree of tree 2. are 'a' and 'and..two', and the cost of 'a' can be found at Iteration 1 and Left=1; the cost of 'and..two' can be found at Iteration 5 and Left 3; their cost are 0.22 and 1.02. 1. a 2. am \ / \ /\ /\ \ / \ /a \ \ / \ ------ /\ / \ / \ /am..two \ / \ ----------- / and..two \ cost = 1 + 1.53 -------------- = 2.53 cost = 1 + 0.22 + 1.02 = 2.24 3. and / \ 4. egg / \ / \ / \ \ / \ / \ \ / \ \ /a..am \ / \ / \ / \ ------------- / \ / a..and \ / \ /egg..two \ ------------/ if..two \ ----------- ----------- cost = 1 + 0.58 + 0.57 cost = 1 + 1.02 + 0.47 = 2.15 =2.49 6. the 5. if / \ / \ / \ / \ / \ / \ / \ \ / a..if \ / \ / \ / \ ---------/two..two\ /a..egg \ / \ ----------- -----------/if..two \ cost = 1 + 1.83 + 0.08 ----------- = 2.91 cost = 1 + 1.17 + 0.47 = 2.64 7. two \ \ / \ / \ / a..the \ ----------- cost = 1 + 1.89 = 2.89 Code for Cost function: int Cost(int left, int right) { if left > right return 0; min = negative_infinity; p = 0; //add memoization here for i = left to right do p += Pi; for i = left to right do if Cost(left,i-1) + Cost(i+1,right) < min min = m; } with memoization Cost[l, r] called for only O(n^2) values, each called for O(n) time Therefore O(n^3) for the algorithm, this can be improved to O(n^2).