--------------------------------------------------------------------------- COMP 731 - Data Struc & Algorithms - Lecture 25 - Friday, September 1, 2000 --------------------------------------------------------------------------- Today: Dynamic Programming, continued o Longest Common Subsequence Longest Common Subsequence Given: Strings x = x_1 ... x_N and y = y_1 ... y_M over some alphabet Sigma Find: A longest common subsequence of these two strings: that is, a maximal length string z such that z = x_{i_1} x_{i_2}... x_{i_n} = y_{j_1} y_{j_2}... y_{j_n} for some 1 <= i_1 < i_2 < ... < i_n <= N 1 <= j_1 < j_2 < ... < j_n <= M Idea: 1. Always optimal to match the last letter, if possible. ..........a | ..........a verses ..........a / / ......a...a The first is at least as good as the second. (Explain why) 2. If not possible to match the last letter, do whatever is best for the possibilities: when the last letter is omitted from the first string; and when the last letter is omitted from the second string ___________ | .........|a | ---- | ..........b | |______________| ______________ | ..........a | | ---- | .........|b |___________ So solve the above two problems and take the maximum. Why does this work? Any LCS can match AT MOST the final "a" of the first string OR the final "b" of the second string. Now, with symbols! Let A[n,m] = the length of the longest common subsequence between strings a_1 ... a_n and b_1 ... b_m. Our goal is to compute A[N,M]. Actually, we want to recover the solution, but that is easy to do by modifying the program that computes A[N,M]. Now we express A[n,m] recursively: / 1 + A[n-1, m-1] if n>=1 and m>=1 and a_n = b_m A[n,m] = | max{ A[n-1,m], A[n,m-1] if n>=1 and m>=1 and a_n != b_m \ 0 if n = 0 or m=0 Work out example: x = a b b a b y = b b a b a the string x 0 1 2 3 4 5 e a b b a b ------------------------- s 0 e | 0 | 0 | 0 | 0 | 0 | 0 | t ------------------------- r 1 b | 0 | 0 | 1 | 1 | 1 | 1 | i ------------------------- if match, if not match n 2 b | 0 | 0 | 1 | 2 | 2 | 2 | __ g ------------------------- |\ /|\ 3 a | 0 | 1 | 1 | 2 | 3 | 3 | \ +1 | maximum y ------------------------- \ <---+ 4 b | 0 | 1 | 2 | 2 | 3 | 4 | ------------------------- record which of these three possibilities 5 a | 0 | 1 | 2 | 3 | 3 | 4 | in order to recover the solution. ------------------------- As before, this can be collapsed into using a single array, of length m, scanned n times from right-to-left. Figure out how!