--------------------------------------------------------------------------- COMP 731 - Data Struc & Algorithms - Lecture 19 - Friday, August 11, 2000 --------------------------------------------------------------------------- Today: o Hashing applications o Binary search trees 1. symbol table in compiler. Mentioned yesterday in connection with perfect hashing 2. Given sets S and T, does S intersect T? Suppose |S|=|T|=n. Naive algorithm is Theta(n^2) time. Here is an expected O(n) time algorithm (under the uniform hashing assumption) for each s in S Insert(dict, s) for each t in T do if Find(dict, t) then return "S and T intersect" 3. Associative array A["hello"] = 7 A["there"] = 21 A["algorithm] = 103 To compute A[str] to a Find(str). To set A[str]=x, replace the item Find(str) by an item with the new information ...... Binary Search Trees Also support Insert Find Delete In addition, FindMin FindMax Succ Pred are well supported. BST property: key of every node in LeftTree(x) < key (x) < key of every node in RightTree(x) Show how to insert -- easy Show how to delete -- case 1: the node being deleted has no children. Remove it. case 2: the node being deleted has a single child. So remove that node and move up the child. case 3: the node being deleted, x, has two children. The succ(x), which is node you get by moving right and then left as far as possible. Necessarily that node has no left child. So remove it using case 1 or case 2, and have it replace x. Minor problem: case 3 will tend to "skew" the tree: since we are alway replacing a node x with a node from the RIGHT subtree of x, the tree will start to get left-heavy. One possibility is to ignore this. Another possibility is to randomly select to replace x by the successor (right and the left as far as possible) or the predecessor (left and then right as far as possible). A simpler possibility (no pseudorandom random-number generation needed) is to ALTERNATE between these two possibilities. Tree traversals: Described breadth-first traversal (uses a QUEUE), and Depth-first traversals: preorder(x): visit(x) preorder(left(x)) preorder(right(x)) inorder(x): inorder(left(x)) visit(x) inorder(right(x)) postorder(x): postorder(left(x)) postorder(right(x)) visit(x) Draw a tree and illustrate.... The operations on a tree take time proportional to the height of the tree. (The height of the tree is the length of the longest path from the root to a leaf. The height of a node is the length of a longest path from that node to a leaf, where edges are directed from the root towards the leaves.) So good trees are "balanced" -- bushy. Theta(n) in the worst case. Theta(lg n) in the best case. Are most nodes in most trees closer to being lg n in depth, or closer to n in depth? The former. Let's think about the "average distance of a node to the root in a 'random' BST. We have to think what we mean by a 'random' BST. Here I will mean that, among all the possible shapes of binary trees with n nodes, we choose one at random. (There are other interpretations. Perhaps a more meaningful one is that the tree is the shape of the tree produced by inserting random numbers in the interval [0,1].) One node is selected for the root. There are then n-1 remaining nodes to distribute. We could put LEFT RIGHT -------------------------- 0 n-1 1 n-2 2 n-3 ... ... n-3 2 n-2 1 n-1 0 We are saying that all of these possibilities are equi-probable. F(n) = the TOTAL path length of the tree, = sum (distance between x and the root r) x = sum depth(x) x Then n-1 F(n) = 1/n Sum (F(i) + F(n-1-i) + 1) i=0 n-1 = 2/n Sum F(i) + (n-1) i=0 Look familiar? This is the same recurrence as for the expected running time of Quicksort! So it's solution, you will recall, is Theta(n lg n). So the expected total path length is about c n lg n. There are n nodes, so the expected depth of a node in a random tree is c lg n. But this is of little help if our keys are, for example, in order! I want to consider two ways to deal with this, way to try to force balance. 1. AVL trees - a "classical" method, Adelson-Velskii and Landis (1962) 2. Splay trees - an elegant, "modern" method of Selator-Tarjan (1985) balance will be obtained in an amortized sense