Overview of Parallel Tree Search
--------------------------------
The tree search program generates a random tree, and searches the tree for
a leaf node at maximum depth with minimum "cost".  The main purpose of
the program is to illustrate dynamic load balancing in parallel depth-first
search of a tree.

There are two input values governing the structure of the tree:

    max_depth:  the maximum possible depth of a tree node.
    max_children:  the maximum possible number of children a node
        can have.

Input to the program consists of these two values and two additional
values that govern dynamic load balancing in the program:

    cutoff_depth:  when work is redistributed, nodes in the tree
        that have depth greater than or equal to the cutoff_depth are 
        not re-distributed.
    max_work:  the maximum number of nodes to expand in a single
        call to parallel depth-first search before checking whether
        a process has received requests for work.

Output from the program consists of the cost of a minimum cost leaf, and
a description (ancestor list -- see below) of the node that had the
minimum cost.  If the program was compiled with "-DSTATS", information
on the performance of the program will also be printed.

Basic Algorithm
---------------
    1.  Get input and initialize basic data structures. [main.c]
    2.  Process 0:  Use (serial) depth-first search to generate enough
        of the tree so that there is at least one node per process.
        [par_tree_search.c]
    3.  Process 0 scatters the nodes generated in step 2 among the 
        processes.  [par_tree_search.]
    4.  do {
    4a.     Run depth first search of local tree. [par_dfs.c]
    4b.     Service requests for work. [service_requests.c]
    4c. } while (any process still has work [work_remains.c])
    5.  Update solution [solution.c]
    6.  Print results of search [solution.c]
    7.  Clean up [main.c]

Main Data Structures
--------------------
Each node of the tree consists of the following list of integers:

    depth:  the depth of the node in the tree (root has depth 0,
        children of the root have depth 1, grandchildren of the
        root have depth 2, etc)
    sibling rank:  the number of siblings "older" than the node.
        For example, if a parent node has 3 children, the left-most
        child has sibling rank 0, the middle child has sibling rank
        1, and the right-most child has sibling rank 2:

                          parent
                         /  |  \
                        0   1   2

    seed:  for internal nodes, the seed is used in determining how
        many children it will have and their seed values.  For leaf 
        nodes at maximum depth the seed is the cost of the node.
    ancestors:  a list of the sibling ranks of the ancestors of a
        node.  For the root, the list will be empty.  For a child
        of the root, the list will consist of the single integer 0
        (the root has sibling rank 0).  For a child of the first
        child of the root, the list will contain (0,0).  For a
        child of the second child of the root, the list will contain
        (0,1), etc.  The ancestor list of a node together with the 
        sibling rank of the node uniquely identifies the node.
    size:  the size of the node is the number of integers composing
        the node.  Thus, the size of a node will be 4 + length of
        the ancestor list = 4 + depth of node - 1.  The size member
        is used to identify the new top of the stack when we pop the
        stack.     
        
The tree itself isn't an explicit data structure in the program.  Each
process pushes nodes onto its stack using depth-first search.  So the
main data structure in the program is each process' local stack.  In
the C program this is a struct consisting of the following members

    stack_list:  an array of ints storing the tree nodes that have
        been pushed onto the stack using depth first search.
    max_size:  the number of ints allocated for the stack.
    in_use:  the number of ints actually in use for the stack.
    stack_top:  a pointer to the beginning of the node on the top
        of the stack.

Program files
-------------
    main.c:  get input, set up data structures, call Par_tree_search,
        print stats, free data structures.
    par_tree_search.c:  process 0 generates enough of tree so that
        its local stack has at least one tree-node per process, and
        scatters the tree-nodes among the processes.  Runs main
        do-while loop -- calls Par_dfs, Service_requests, and
        Work_remains.  After completion of do-while loop, it calls
        functions to update the solution on each process, and print
        the solution.
    par_dfs.c:  runs depth-first search on tree.  Returns
        after local stack is exhausted, or it has expanded max_work
        tree-nodes.
    service_requests.c:  uses MPI_Iprobe to see whether other processes
        have requested work.  If so, it checks its local stack to see
        whether it has any nodes above cutoff_depth.  If so, it sends
        half of them to the requesting process.  If not, it sends
        a "reject" message.  The splitting of the stack is done using
        MPI_Type_indexed.  After the data has been sent.  The stack
        is "compressed".
    work_remains.c:  Checks whether local stack is empty.  If not, it
        returns TRUE.  If it is, it executes the following algorithm.
            Send "energy" to process 0 (see terminate.c, below).
            while(TRUE) {
                Send reject messages to all processes requesting work.
                if (the search is complete) {
                    Cancel outstanding work requests.
                    return FALSE.
                } else if (there's no outstanding request) {
                    Generate a process rank to which to send a request.
                    Send a request for work.
                } else if (a reply has been received) {
                    if (work has been received) return TRUE.
                }
            }
    node_stack.c:  functions for manipulating tree-nodes and the stack.
    solution.c:  functions for keeping track of the best solution so
        far.
    queue.c:  functions for checking for pending messages.
    terminate.c:  functions for determining whether the search
        has been completed.  The idea is described in chapter 14 of
        the text in the section "Distributed Termination Detection."
        Here's the basic idea.  At the start of the program, process 0 has
        one unit of some indestructible quantity -- I called it energy.  
        When the data is distributed, the energy is divided into p
        equal parts and also distributed among the processes.  So each 
        process has 1/p of the energy.  (I don't actually send the energy 
        out initially, since every process knows it will get 1/p units of 
        energy, each can just set its energy to 1/p).  Whenever
        a process exhausts its local stack, it sends whatever energy
        it has to process 0.  Whenever a process satisfies a request
        for work, it splits its energy in two equal pieces, keeping half
        for itself and sending half to the process receiving the work.
        Each process keeps track of its available energy in "my_energy".
        Process 0, also keeps track of returned energy in "returned_energy."
        Since energy is never destroyed, the sum of all energy on all the
        processes will always be 1.  When process 0 finds that
        "returned_energy" is 1, no process (including itself) will have
        any work left, and it can broadcast a termination message to all
        processes.  The only catch here is that floating point
        arithmetic isn't exact.  So I've written functions for exact
        rational arithmetic:  a fraction a/b is represented as a pair of
        longs (a,b).  Dividing by 2 changes (a,b) to (a,2b).  Adding
        (a,b) + (c,d) = (ad + bc,bd).  This can still cause problems,
        but I haven't added any security checks.  Future work, no doubt.
    stats.c:  functions for keeping track of runtime, requests sent,
        etc.  
    cio.c:  basic I/O functions -- see Chap 8 in PPMPI.
    vsscanf.c:  Used by basic I/O functions to copy a variable length
        argument list into a string.  Only implemented for types
        int, string, float, and double.

Since the functions Par_tree_search, Par_dfs, Service_requests, and 
Work_remains are supposed to be adaptable to general parallel depth-first
search, some of the functions they call are passed "dummy" arguments that
are, in fact, never used.   
