Computer Science

ECS 224 String Algorithms and Applications in Computational Biology

ECS 224 STRING ALGORITHMS AND APPLICATIONS IN COMPUTATIONAL BIOLOGY (4) I

Lecture: 3 hours

Discussion: 1 hour

Prerequisite: Course ECS 122A

Grading: Letter; homework (30%), project (25%), final (45%)

Catalog Description:

Algorithms that operate on strings. Pattern matching, sets of patterns, regular expression pattern matching, suffix trees and applications, inexact similarity, parametric sequence alignment, applications to DNA sequencing and protein database searching. Offered in alternate years.

Goals:

The purpose of the course is to introduce important techniques in pattern matching and pattern discovery using combinatorial algorithms that operate on strings. This area has recently become of great interest because of its utility in computational biology, particularly in sequence database searching. This course will focus on computer science techniques with an eye towards their productive application in biology. No background in biology will be assumed.

Expanded Course Description:

  1. Exact matching using the Z and the Boyer-Moore algorithms
  2. Suffix trees and linear time algorithms for computing them
  3. Applications of suffix trees in pattern matching and computational biology
  4. Inexact matching, edit distance and alignment
  5. Accelerations for inexact matching
  6. Additional alignment models for biological applications of inexact matching. Gap questions.
  7. Hybrid dynamic programming — integrating suffix trees with inexact matching
  8. Database searching
  9. Applications of string algorithms in DNA sequencing problems
  10. String algorithms in phylogenetic reconstruction

Project:

The project involves some computer implementation of an important algorithm or the theoretical development of some new method.

Textbook:

D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge Press, 1997

Instructor: D. Gusfield

Prepared by: D. Gusfield (September 2002)

Overlap Statement:

There is no significant overlap with any other course.

9/02

border