Computational Structural Biology: Winter 2024

Sequence Alignment

A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein molecules to identify regions of similarity that may reveal functional, structural, or evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as point mutations and gaps as insertion or deletion mutations. In sequence alignments of proteins, the degree of similarity between amino acids occupying a particular position in the sequence can be interpreted as a rough measure of how conserved a particular region is.

Very short or very similar sequences can be aligned by hand. However, most interesting problems require the alignment of long, often highly variable numerous sequences Human knowledge is then applied in constructing algorithms to produce high-quality sequence alignments, and occasionally in adjusting the final results to reflect patterns that are difficult to represent algorithmically (especially in the case of nucleotide sequences). Computational approaches to sequence alignment generally fall into two categories: global alignments and local alignments. Calculating a global alignment is a form of global optimization that ensures that the alignment span the entire length of all query sequences. By contrast, local alignments identify regions of similarity within long sequences that are often widely divergent overall.

Homology: paralogs and orthologs

Lecture Notes

Download document:

Powerpoint document (click to download)
or
PDF document (click to download)
or
PDF document: 3 slides/page (click to download)

Further Reading

Notes from lecture on 1/25: Blosum matrices

Notes from lecture on 2/1: Sequence alignment

Notes from discussion on 2/6: Sequence alignment

Notes from lecture on 2/6: Sequence alignment

Practicing alignments: Word document or PDF document.

Practicing alignments (solutions): Word document or PDF document.

Online review on DotPlot sequence comparison

Original paper by Needleman and Wunsch, 1970

Original paper by Smith and Waterman, 1981

An overview of Sequence Alignment , Gene Myers

Statistics of multiple seq. alignments , Altshul

Original paper by Dayhoff, 1978: Substitution matrix