next up previous contents
Next: Program Controls Up: XPARALParametric Sequence Aligment Previous: Contents

Introduction to Parametric Alignment

The optimal alignment between two DNA or amino acid sequences for a given set of weights is computed by classical dynamic programming techniques. However, there is often considerable disagreement about how to weight matches, mismatches, insertions/deletions (indels) and gaps. Parametric Sequence Alignment is the problem of computing the optimal valued alignment between two strings as a function of variable weights for matches, mismatches, spaces and gaps. The goal is to partition the parameter space into regions (which are necessarily convex) such that in each region any alignment that is optimal for some choice of parameters inside the region is optimal throughout that entire region and nowhere else. XPARAL is a program that efficiently finds and displays that convex parametric decomposition. It explicitly maps out the full dependence of the optimal alignment on the parameter choices.

An alignment A of two strings S and S' of lengths n and m respectively is obtained by introducing spaces into, or at the ends of, the two strings such that the lengths of the two resulting strings are identical, and then placing these two resultant strings one upon the other subject to the constraint that no column contains two spaces. Any column that contains two identical characters is called a match. Any column that contains two dissimilar characters is called a mismatch and any column that contains a space will be referred to as a space or indel (an abbreviation of insertion/deletion). A series of one or more contiguous spaces in the same string will be referred to as gap. Our use of the term ``gap'' may be a bit different than typically used in biological literature, where ``gap'' is sometimes used in place of what we call a ``space.''

The parametric value of an alignment A may be characterized by the number of matches, mismatches, spaces and gaps it contains. We denote these quantities by m, ms, s, and g respectively. We let w,x,y,z be variables that take on non-negative values and denote variable weights or penalties. Then the value of alignment A as a function of w,x,y, and zis wm - xms -ys -zg, which is a linear function of w,x,y and z. (Actually, there are only three independent variables since we could divide the expression by w, say, and still preserve the relative order of the value of all the alignments. We use four variables for convenience.) For fixed values of w, x, y and z, an optimal (maximum value) alignment of two strings can be found by dynamic programming in O(nm) time. Parametric alignment is the problem of solving the optimal alignment problem for all choices of values for w,x, y and z.

XPARAL solves the parametric alignment problem, but for only two choices of parameters at a time. This is because it is easier to view two dimensional figures than three dimensional figures. The user is asked to choose which term (matches, mismatches, indels or gaps) to multiply by x and which by y, and to choose constant multipliers for the two remaining terms. For example, the user might choose the objective function

MAX c1(#matches) - x(#mismatches) -y(#indels) - 0(#gaps)

This gives a constant weight c1 to each match, a variable penalty x to each mismatch, a variable penalty y to each indel and a penalty of zero to each gap.

Or the user might choose:

MAX c1(#matches) - c2(#mismatches) - x(#indels) - y(#gaps)

Which varies the gap initiation penalty y against the gap extension penalty x. XPARAL allows the user to specify any one of the six possible ways of choosing the two terms to parameterize.

Given two strings, and a choice of the parametric function, XPARAL will then solve for the optimal alignment as a function of x and y. The output is a decomposition of the x, y space into convex polygons which have the following property: An alignment that is optimal for at least one x,y point in the interior of a polygon is optimal for all points in that polygon and nowhere else. Hence this polygonal decomposition completely characterizes the dependence of the optimal alignment on the choices of values for x,y. For each polygon in the decomposition, XPARAL displays one alignment that is optimal for that polygon. Note that there may be other alignments that are optimal in the polygon, but they too will be optimal for every point in the polygon and nowhere else. Hence the polygonal decomposition is a function of the strings and the objective function, but not of the particular alignments chosen for display.

XPARAL finds each polygon of the decomposition in O(R + E) time, where R is the number of polygons in the decomposition, and E is the time to compute a single alignment when x and y have fixed values. For all the allowed choices in XPARAL, R and E are at most nm, hence the time to find each polygon is proportional to just a single alignment computation! This bound, however, is amortized (averaged) over all the computations and the first polygons take much more time to find than that later ones.


next up previous contents
Next: Program Controls Up: XPARALParametric Sequence Aligment Previous: Contents
Kristian Stevens
1998-10-13