Fall 2011 - String Algorithms and Algrorithms in Computational Biology - Gusfield

  • This index page will just link to the various course handouts that are available on the web, and provide some description of them.

    Distribution List

    1. Course Syllabus (brief)
    2. Homework 1, Due September 29.


    3. First Notes on Suffix trees
    4. Notes on Lempel-Ziv string compression using suffix trees
    5. Notes on suffix arrays This introduces suffix arrays and some of their uses, but not the linear-time construction.
    6. Notes on linear-time construction of suffix arrays This describes the Karrkarian and Sanders algorithm discussed in class on Sept. 27. The next video covers that material as well.

    7. Video on Finding a suffix array in linear time.
    8. NOTE: There will be no class on Sept. 29 - but I will try to post a new homework by that evening or the morning of Sept. 30. Instead of going to class on Sept. 29, watch the posted video on the LCA problem, linear-preprocessing time, constant lookup time. This is one of the CS 222 videos linked from my webpage. The date on it is 11/30/07, and it continues at the start of the next video as well.

    9. Notes on computing the LCP ( or Depth) array in linear time
    10. Video on the linear-time LCP finding. The LCP array, along with the suffix array for a string, allows linear-time solution to many string problems using a suffix array instead of suffix tree. The two arrays also allow the construction of a suffix tree in linear time.

    11. We will not discuss edit distance and basic sequence alignment in detail in this class, since most people have already been exposed to these topics, and because they have simple solutions via DP (the local alignment pro blem is more complex). If you have never seen the DP for basic sequence alignment or edit distance, read about it in Tardos and Kleinberg, or see the next pdf slides, or see the video lecture of Oct. 8 2007 in the cs 222A videos, Or, see the videos posted for CS 124, lectures 5 through 11. These are linked from my webpage.
    12. PDF slides on an Intro to DP, illustrated by the Maximum Sequence Alignment Problem This is similar to the sequence alignment problem discussed in Tardos and Kleinberg, but there the problem is cast as a minimization problem rather than a maximization problem.
    13. Video Lecture on Advanced Alignment. Advanced alignment topics: optimal alignmentin linear space; alignment of circular strings faster than the obvious way. Sequence alignment in linear space without an increase in the asymptotic running time; alignment of circular sequences much faster than the straightforward way.
    14. Notes on the Z-algorithm
    15. Notes on Boyer-Moore
    16. Solution to the RNA matching count problem.
    17. Perl program to count the number of matchings. Try it out and see if you find an error
    18. Taxonomy of Suffix Array Construction Algorithms here
    19. Replacing suffix trees with suffix arrays Possible paper for student presentation.

    20. Homework 2, Due Thursday October 13. Deadline extended to the following tuesday.
    21. Notes on multiple common substrings discussed on Oct. 6 This is a concatentation of two sections from my string book, so it may seem odd in some places, but it works.
    22. Notes on linear space alignment using DP This is also discussed on the advanced alignment video -
    23. Notes on hybrid dynamic programming
    24. A linear time BWT inversion method this is not the one we discussed in class, but the one that is referenced for HW 3.
    25. This paper has the Ferragina and Manzini search method using BWT It describes the FL or FM index and the exact matching method. I don't fully understand how it works on the compressed BWT string (because I don't think they give sufficient details, for example on the use of Four-Russians), so if you can read and understand that part, please do. We probably will not discuss the part of the paper on the use of LZ compression, so that would be another good part of the paper for a student project (if you want to do one).
    26. Homework 3, Due in two weeks
    27. Notes on the Perfect Phylogeny Problem
    28. Notes on Splits Lecture on Tuesday Nov. 1 is from this material.
    29. Homework 4, Due on Nov. 29, but it is long, so don't wait to start.
    30. Second Set of notes phylogenetic networks
    31. a missing page
    32. Introduction to recombination networks and ARGs
    33. A deeper introduction to recombination networks and ARGs
    34. secondargnotes - Notes on some lower bounds on Rmin and on some ARG construction methods.
    35. Homework 5, Due by the final exam
    36. buneman224.pdf Buneman problem.
    37. HW1 solution The solution to problem 1 of HW 1 was already posted. Here is the solution to problem 2 of HW 1.
    38. HW2 solutions to problems 1,2,4,5,6 Updated the morning of Dec. 1. More coming - stay tuned.
    39. Some HW3 solutions More coming.
    40. final exam Some people didn't get it in the attachment when I sent mail from smartsite.