
COMP SCI 731  Data Struc & Algorithms  Lecture 3  Tuesday, June 27, 2000

Today: [ ] More on the use of stacks: graph connectivity / DFS
[ ] Review of asymptotic efficiency measures
[ ] Implementing stacks
DFS  connectivity

Last time we were reviewing graph representations.
The adjacency list representation of a graph provides, for each vertex v,
a list, Adj(v) (in no particular order) of the neigbor set of v, N(v).
We commented last time that the lists in the adjacency list of
each vertex may be ordinary linked lists, or they may be
(dyanmically allocated) arrays. Or it may be one big array into which we
have pointers. The array representations are often faster.
Either way, we can list the neighbors of a vertex v, N(v),
in O(N(v)) time.
Remind students what it means for a graph to be connected.
Algorithm to decide if G is connected::
** note: this is done quite differently than in lecture. **
** Here it is done in a way that generalizes better. **
First, do an an example, with colors to indicate the status of each vertex.
Initially, each vertex is
UNSEEN. Later it will be
PENDING, and finally it will be
VISITED.
int status[n+1] // will mark the status of each
// vertex as UNSEEN, PENDING, VISITED
Boolean Connected(G) // recursive version. G=(V,E) has n vertices and m edges.
// For concreteness, V={1,2,...,n}
for i=1 to n do
status[i] = UNSEEN
dfs(1)
/* alternatively, count the visited vertices above and see if it is n */
for i = 1 to n
if status[i] == UNSEEN then return false // graph is not connected
return true // graph is connected
void dfs(v)

 status[v] = PENDING

 for i in N(v) do
 if status[v] == UNSEEN then
 dfs(v)
 status[v] = VISITED

>> Show example <<
>> Rewrite to increment a counter k (initially 0) when each vertex
is visited, and change the fragment which looks for UNSEEN verticies
to the more simple " return (k==n) ".
This program is using a stack. Where? The runtime stack in the execution
environment, of course!
Let's rewrite it nonrecursively, to explicitly manage
our stack. This is usually faster. This won't be a
DFS, as we are only trying to decide connectivity.
** Here is quite different from what I did in class. What
was done in class was not good; having a vertex appear
multiple times on the stack is a bad idea. The following
does not run in exactly the same way as the above, but
it still decides connectivity **
int status[n+1] // will mark the status of each
// vertex as UNSEEN, PENDING, VISITED
stack s // of vertices we have yet to visit
Boolean Connected(G) // nonrecursive version
for i=1 to n do
status[i] = UNSEEN
s.Push(1)
k = 0
while !s.Empty() do
i = s.Pop()
status[i] = VISITED
for each j in N(i) do
if status[j] = UNSEEN then
 status[j] = PENDING
 s.Push(j)
return (k==n)
Stack Implementations

tos

\./
     
 >  > >  > >  \ 
     
1 2 3 4 5 6 7 8 9 10

n = 4  8  3  7        

initially, n=0 and max = 10, say.
Breaks the abstraction
/
Push(item) / Pop()
if n==MAX then Error() if n==0 then Error()
n++ save = stack[n]
stack[n]=item return save
Dynamic approach.
Start with an array with some number of elements, max.
If it fills up, allocate a new array with (say) twice as many elements.
Empty the old elements into the new array, and push the new element.
Worst case time: expensive  n Push() operations, some operation
could take O(n) time.
But is it really so bad??
Over a sequence of operations, the time per operation will still be O(1).
Example: 10 items, 1 usec for "ordinary" push, n usec for a push which
results in an max > 2 max stack reallocation.
// now the array has 10 spaces, none used
10 x Push > cost 10 // now all 10 are used
1 Push* > cost 10 // now there are 20 elements in the array, 11 used
9 x Push > cost 9 // now all 20 elements of the array are used
1 Push* > cost 20 // now there are 40 elements in the array, 21 used
19 x Push > cost 19 // now there are 40 elements in the array, all used
1 Push* > cost 40 // now there are 80 elements in the array, 41 used

41 operatrions, cost 108
In general, the cost will remain O(1) per operation over any sequence
of operations (prove this!).
Amortized running time: the average time per operation over a worst
case sequnce of operations. We will ofen fall back to amortized
running time analysis
O Notation

We've been using bigO notation. Let me remind you:
O(f(n)) is a set of functions (from N to N, say).
A function g is in this set if
there is a constant C such that g(n) <= C f(n) for almost all all n.
Give examples e.g.,
o 7n^2 is O(n^2). Is Theta(n^2)
o 7n^2 + 100 n is O(n^2). Is Theta(n^2)
o 7n^2/log n is O(n^2). It is not Theta(n^2)
T/F:
o if f is Theta(g) then g is Theta(f) TRUE
o O(n lg n) = O(n log n) TRUE
o if f is O(g) and f' is O(g) then f+f' is O(g) TRUE