Fall 2011 - String Algorithms and Algrorithms in Computational Biology - Gusfield
This index page will just link to the various course handouts that
are available on the web, and provide some description of them.
Distribution List
- Course Syllabus (brief)
- Homework 1, Due September 29.
- First Notes on Suffix trees
- Notes on Lempel-Ziv string compression using suffix trees
- Notes on suffix arrays
This introduces suffix
arrays and some of their uses, but not the linear-time construction.
- Notes on linear-time construction of suffix arrays
This describes the Karrkarian and Sanders algorithm discussed in class on Sept. 27. The
next video covers that material as well.
- Video on Finding
a suffix array in linear time.
- NOTE: There will be no class on Sept. 29 - but I will try to post a new homework by
that evening or the morning of Sept. 30. Instead of going to class on Sept. 29, watch
the posted video on the LCA problem, linear-preprocessing time, constant lookup
time. This is one of the CS 222 videos linked from my webpage. The date on it
is 11/30/07, and it continues at the start of the next video as well.
- Notes on computing the LCP ( or Depth) array in linear time
- Video on the linear-time LCP finding.
The LCP array, along with the suffix array for a string, allows linear-time solution to many string
problems using a suffix array instead of suffix tree. The two arrays also allow the construction of a
suffix tree in linear time.
- We will not discuss edit distance and basic sequence alignment in detail in this class, since most people have
already been exposed to these topics, and because they have simple solutions via DP (the local alignment pro
blem is more complex). If you
have never seen the DP for basic sequence alignment or edit distance, read about it
in Tardos and Kleinberg, or see the next pdf slides, or
see the video lecture of Oct. 8 2007 in the cs 222A videos,
Or, see the videos posted for CS 124, lectures 5 through 11. These are linked from my
webpage.
- PDF slides on an Intro to DP, illustrated by the Maximum Sequence Alignment
Problem
This is similar to the sequence alignment problem discussed in Tardos and Kleinberg,
but there the problem is cast as
a minimization problem rather than a maximization problem.
- Video Lecture on Advanced Alignment.
Advanced alignment topics: optimal alignmentin linear space; alignment of circular strings faster than the obvious way.
Sequence alignment in linear space without
an increase in the asymptotic running time; alignment of circular sequences much faster than the straightforward way.
- Notes on the Z-algorithm
- Notes on Boyer-Moore
- Solution to the RNA matching count problem.
- Perl program to count the number of matchings. Try it out and
see if you find an error
- Taxonomy of Suffix Array Construction Algorithms here
- Replacing suffix trees with suffix arrays
Possible paper for student presentation.
- Homework 2, Due Thursday October 13. Deadline extended
to the following tuesday.
- Notes on multiple common substrings discussed on Oct. 6 This is
a concatentation of two sections from my string book, so it may seem odd in some places,
but it works.
- Notes on linear space alignment using DP This
is also discussed on the advanced alignment video -
- Notes on hybrid dynamic programming
- A linear time BWT inversion method this is not the one we discussed in class, but
the one that is referenced for HW 3.
- This paper has the Ferragina and Manzini search method using BWT It describes
the FL or FM index and the exact matching method. I don't fully understand how it works
on the compressed BWT string (because I don't think they give sufficient details, for example
on the use of Four-Russians), so
if you can read and understand that part, please do. We probably will not discuss the part of
the paper on the use of LZ compression, so that would be another good part of the paper for
a student project (if you want to do one).
- Homework 3, Due in two weeks
- Notes on the Perfect Phylogeny Problem
- Notes on Splits Lecture on Tuesday Nov. 1 is from this
material.
- Homework 4, Due on Nov. 29, but it is long, so don't wait to start.
- Second Set of notes phylogenetic networks
- a missing page
- Introduction to recombination networks and ARGs
- A deeper introduction to recombination networks and ARGs
- secondargnotes - Notes on some lower bounds on Rmin and on some ARG construction methods.
- Homework 5, Due by the final exam
- buneman224.pdf Buneman problem.
- HW1 solution The solution to problem 1 of HW 1 was already
posted. Here is the solution to problem 2 of HW 1.
- HW2 solutions to problems 1,2,4,5,6
Updated the morning of Dec. 1. More coming - stay tuned.
- Some HW3 solutions More coming.
- final exam Some people didn't get it in the attachment
when I sent mail from smartsite.