---------------------------------------------------------------------------
COMP 731 - Data Struc & Algorithms - Lecture 25 - Friday, September 1, 2000
---------------------------------------------------------------------------

Today: 

Dynamic Programming, continued 
   o Longest Common Subsequence

Longest Common Subsequence

 Given: Strings x = x_1 ... x_N and
                y = y_1 ... y_M over some alphabet Sigma

 Find: A longest common subsequence of these two strings:  that is,
       a maximal length string z such that
             z = x_{i_1} x_{i_2}... x_{i_n}
               = y_{j_1} y_{j_2}... y_{j_n}
       for some 1 <= i_1 < i_2 < ... < i_n <= N
                1 <= j_1 < j_2 < ... < j_n <= M

Idea:
1. Always optimal to match the last letter, if possible.

      ..........a
                |
      ..........a

verses

      ..........a
               / 
             /
      ......a...a

 The first is at least as good as the second.  (Explain why)


2. If not possible to match the last letter, do whatever is best for
   the possibilities: when the last letter is omitted from the first
   string; and when the last letter is omitted from the second string

     ___________
    | .........|a
    |          ----
    | ..........b  |
    |______________|


     ______________ 
    | ..........a |
    |          ----           
    | .........|b
    |___________


   So solve the above two problems and take the maximum.
     Why does this work?  Any LCS can match AT MOST the final "a"
     of the first string OR the final "b" of the second string.

Now, with symbols!

Let A[n,m] = the length of the longest common subsequence between strings
             a_1 ... a_n and  b_1 ... b_m.

Our goal is to compute A[N,M].  Actually, we want to recover the solution,
but that is easy to do by modifying the program that computes A[N,M].

Now we express A[n,m] recursively:

          /   1 + A[n-1, m-1]           if n>=1 and m>=1 and a_n = b_m 
A[n,m] = |    max{  A[n-1,m], A[n,m-1]  if n>=1 and m>=1 and a_n != b_m
          \   0                         if n = 0 or m=0


Work out example:
   x = a b b a b
   y = b b a b a 

           the string x
          0   1   2   3   4   5

          e   a   b   b   a   b
        -------------------------
s  0  e | 0 | 0 | 0 | 0 | 0 | 0 |
t       -------------------------
r  1  b | 0 | 0 | 1 | 1 | 1 | 1 |
i       -------------------------      if match,          if not match
n  2  b | 0 | 0 | 1 | 2 | 2 | 2 |           __      
g       -------------------------          |\                   /|\
   3  a | 0 | 1 | 1 | 2 | 3 | 3 |             \  +1              |  maximum        
y       -------------------------               \            <---+
   4  b | 0 | 1 | 2 | 2 | 3 | 4 |
        -------------------------           record which of these three possibilities
   5  a | 0 | 1 | 2 | 3 | 3 | 4 |           in order to recover the solution.
        -------------------------           
                                           As before, this can be collapsed into using
                                           a single array, of length m, scanned n times
                                           from right-to-left.   Figure out how!