ECS110 Notes for Wednesday, November 15, 1995

ECS110 Lecture Notes for Wednesday, November 15, 1995

Professor Rogaway conducting

Lecture 20
Scribe of the day: Paula Barter


REMINDER!  Remember to delete the large files from program 3!!

Today's Lecture
-Hashing



Review:
   ADT Dictionary
      //Data - a set of elements S, Each element having an associated 
      //       Key, Drawn from some universe U of  Keys.  
      //       No two elements have the same key.
      //Operations - 
         void insert(Element i);	//adds i to S
         int in(Element i);      //returns 1 if i is in S else returns 0
         void delete(Element i);	//removes i from S

Now, incorporate this idea of Hashing with this ADT.

 S is a set in the Universe of  Keys.
h is the hash function from S to the Hash Table.
Each section of the Table is called a slot.
A collision is when two different keys map to the same slot   
This is resolved by chaining.

Dynamic Hashing:  When you don't know n (number of items).
   For Example when (the load factor) alpha > 1 then double m (the 
size of the hash table)
   This is called Rehashing.

Claim:  Under uniform hashing assumption, n operations takes 
expected theta(n) time.

Example:
   initialize m = 100
   insert 1st 400 items: 


Note: with each increase of m a new hash function is needed.

Assume m = 4 and the following 4 inserts are called:
   	insert("Fred")
	insert("Sam")
	insert("Alice")
	insert("Ali")

Now insert("Bob") is called.  This would make the load factor 
greater than one.  The old
Hash Table is therefore rehashed with a new hash function.
Now Let's return to the expense of the hash function.
   The average cost from 1 - 101 = 	2
   The average cost from 102 - 201 = 	3
   The average cost from 202 - 401 = 	3
   The average cost from 402 - 801 = 	3

Let's take for example an Accounting trick:
   To show theta(1) expected time per operation over a sequence of 
operations:
   	Suppose the "real" cost for a "simple" insert is  $1.00
	Suppose the "rehash cost" is 			$n.00
For any insert charge $3.00.  This implies that:
	For the "simple" insert use $1.00 to cover costs put the other 
$2.00 in the savings
 	For the "rehash cost"  use the money in savings.
Claim:  There will always be enough money in your savings account 
        to pay for the "rehash cost".
Proof:  Preceding the n rehash there were n/2 "simple" inserts.  
	Then there are $n in the savings, 
	which is enough to cover costs.



Here are Three ways to make your h (hashing) function:

1.   Division Method
	k (a key)  //enormous integer
	h(k) = k % m  //especially common when k is a word
	
	choose m to be prime but a prime not close to a power of 2.
	For example:
	   m = 256 could only deal with the last byte of k.  
	   m = 701 is good for about 1000 items.

2.   Multiplication Method
	h(k) = |_ m*(betta*k)*fractional part _|   // betta- a weird 
                                                      //constant 
						  // ie (squareroot(5) 
                                                    //- 1) / 2 = .6180339877

3.   Folding
	k = x1x2x3....xt      // |x1| = 16
	h(k) = the sum from 1 to t (xi) % m
	However this is insensitive to the arrangement of strings.  For 
        example:
		x1x2x3x4
		x2x4x1x3
		x3x2x4x1
	All of these have the same value.

Note:  Simple hash functions increase speed.

Perfect Hashing ( arrangement of things so there are no 
collisions)
	- choosing h such that no two keys from S collide
	- need to know S in advance
	- You know S for example in compilers
Let's say we have 60 reserved words.  We are given a set of keys 
and we need to decide
which are reserved words and which are identifiers.
	Given: String S
	Return: (reserved word, the word itself) or (identifier, the word 
itself)

	1. You could compare each word with list of reserve words.  But 
this is very slow 
	    (60 comparisons).
	2. Binary Search Tree.  Depth is n, there will still be n 
comparisons.
	3. Find a perfect Hash Program, this will be very fast.
ECS110 Lecture Notes for Wednesday, November 15, 1995

Professor Rogaway conducting

Lecture 20 Scribe of the day: Paula Barter

Lecture 20
Scribe of the day: Paula Barter