The optimal alignment between two DNA or amino acid sequences for a given set
of weights is computed by classical dynamic programming techniques. However,
there is often considerable disagreement about how to weight matches,
mismatches, insertions/deletions (indels) and gaps. Parametric Sequence
Alignment is the problem of computing the optimal valued alignment between two
strings as a function of variable weights for matches, mismatches, spaces and
gaps. The goal is to partition the parameter space into regions (which are
necessarily convex) such that in each region any alignment that is
optimal for some choice of parameters inside the region is optimal
throughout that entire region and nowhere else. XPARAL is a program that
efficiently finds and displays that convex parametric decomposition. It
explicitly maps out the full dependence of the optimal alignment on the
parameter choices.
An alignment A of two strings S and S' of lengths n and m respectively is obtained by introducing spaces into, or at the ends of, the two strings such
that the lengths of the two resulting strings are identical, and then placing
these two resultant strings one upon the other subject to the constraint that
no column contains two spaces. Any column that contains two identical
characters is called a match. Any column that contains two dissimilar
characters is called a mismatch and any column that contains a space will be
referred to as a space or indel (an abbreviation of insertion/deletion). A
series of one or more contiguous spaces in the same string will be referred to
as gap. Our use of the term ``gap'' may be a bit different
than typically used
in biological literature, where ``gap'' is sometimes used in place of what
we call a ``space.''
The parametric value of an alignment A may be characterized by the number of
matches, mismatches, spaces and gaps it contains. We denote these quantities
by m, ms, s, and g respectively. We let w,x,y,z be variables
that take on non-negative values and denote variable weights or penalties.
Then the value of
alignment A as a function of w,x,y, and zis
wm - xms -ys -zg, which is a linear function of
w,x,y and z. (Actually, there are only three independent
variables since we could divide the expression by w, say, and still preserve
the relative order of the value of all the alignments. We use four variables
for convenience.) For fixed values of w, x, y and z, an optimal
(maximum value) alignment of two strings can be found by dynamic
programming in O(nm)
time. Parametric alignment is the problem of solving the optimal alignment
problem for all choices of values for w,x, y and z.
XPARAL solves the parametric alignment problem, but for only two choices of parameters at a time. This is because it is easier to view two dimensional figures than three dimensional figures. The user is asked to choose which term (matches, mismatches, indels or gaps) to multiply by x and which by y, and to choose constant multipliers for the two remaining terms. For example, the user might choose the objective function
This gives a constant weight c1 to each match, a variable penalty x to each mismatch, a variable penalty y to each indel and a penalty of zero to each gap.
Or the user might choose:
Which varies the gap initiation penalty y against the gap extension penalty
x. XPARAL allows the user to specify any one of the six possible ways of
choosing the two terms to parameterize.
Given two strings, and a choice of the parametric function, XPARAL will then
solve for the optimal alignment as a function of x and y. The output is a
decomposition of the x, y space into convex polygons which have the following property:
An alignment that is optimal for at least one x,y point in
the interior of a polygon is
optimal for all points in that polygon and nowhere else. Hence this polygonal
decomposition completely characterizes the dependence of the optimal alignment
on the choices of values for x,y. For each polygon in the decomposition,
XPARAL displays one alignment that is optimal for that polygon. Note that
there may be other alignments that are optimal in the polygon, but they too
will be optimal for every point in the polygon and nowhere else. Hence the
polygonal decomposition is a function of the strings and the objective
function, but not of the particular alignments chosen for display.
XPARAL finds each polygon of the decomposition in O(R + E) time, where R is
the number of polygons in the decomposition, and E is the time to compute a
single alignment when x and y have fixed values. For all the allowed choices
in XPARAL, R and E are at most nm, hence the time
to find each polygon is proportional to just a single alignment computation!
This bound, however, is amortized (averaged) over all the computations and the
first polygons take much more time to find than that later ones.