Computer Science

ECS 124 Theory and Practice of Bioinformatics


Lecture: 3 hours
Laboratory: 1 hour

Catalog Description:
Fundamental biological, mathematical and algorithmic models underlying bioinformatics and systems biology; sequence analysis, database search, genome annotation, clustering and classification, functional gene networks, regulatory network inference, phylogenetic trees; applications of common bioinformatics tools in molecular biology and genetics.

Prerequisites: Course 10 or 30 or Engineering 6; Statistics 12 or 13 or 32 or 100 or 131A or Mathematics 135A; Biological Science 1A or Molecular and Cellular Biology 10.

Credit restrictions, cross listings: None

Summary of course contents

  1. Initial examples of the power of bioinformatics in modern biology
    1. the importance of sequence and structure comparison and of database search
    2. The use of sequence analysis in laboratory protocols
    3. The use of phylogenetics in evolution and non-evolutionary areas of biology
  2. Sequence analysis
    1. Probabilistic and biological models underlying sequence alignment
    2. Computational efficiency and the need for compromises in the models
    3. The general technique of dynamic programming
    4. Pairwise sequence alignment – algorithms for global, local alignment and variations
    5. Algorithms for multiple sequence alignment and the identification/use of motifs
    6. Database search, FASTA, BLAST, PSI-BLAST, scoring matrices, statistical significance and its significance
    7. Multiple sequence alignment
    8. Genome assembly and high-throughput transcriptional profiling
  3. Systems Biology
    1. Clustering (K-means, hierarchical clustering)
    2. Classification (naive Bayes, Support Vector Machines)
    3. Machine Learning in Biology
      1. Overfitting, bias-variance trade-off, curse of dimensionality
      2. Validation methods
    4. Biological Networks
      1. Introduction to networks biology
      2. Functional gene networks
      3. Inference of gene regulatory networks
  4. Phylogenetic algorithms
    1. Probabilistic and ideal-data models underlying phylogenetic algorithms
    2. Distance-based methods
    3. Character/parsimony-based methods
    4. Maximum-likelihood methods
    5. Evolutionary and non-evolutionary uses for phylogenetics
    6. Multiscale modeling and simulation of evolutionary systems

Goals: Students will come to: (1) understand the role and utility of bioinformatics in modern biology; (2) understand basic biological, mathematical and algorithmic concepts, techniques and models underlying bioinformatics tools; (3) master common bioinformatics tools; and (4) do simple programming in Perl or Java to extend the utility of common bioinformatics tools.

Illustrative reading

  • R. Durbin et al., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge Press, 1998.
  • A. Baxevanis and B. Ouellete, Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Wiley-Interscience, 1998.
  • M. Bishop and C. Rawlings, DNA and Protein Sequence Analysis: A Practical Approach, IRL Press, 1997.
  • D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge Press, 1997.
  • N. Jones and P. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004

Each homework set includes creative problems as well as recitation problems to strengthen understanding and discover new material.

Computer Usage:
The lab portion of the class will emphasize practical computer exercises using both established bioformatics software and writing simple programs in Perl or Java.

ABET Category Content:
Engineering Science: 1 unit
Engineering Design: 0 unit

Science & Engineering

Overlap: The laboratory section of ECS 124 will overlap to some amount with Animal Genetics 212 (taught by J. Medrano) a graduate course offered every other year. The overlap in the laboratory section is only partial, as ANG 212 looks at more sequence analysis packages than in ECS 124; the lab portions of ECS 124 also look at packages for cluster enrichment, biological networks and phylogenetic analysis, which are not covered in ANG 212. They also involve computer programming in Perl or Java, while no programming is involved in ANG 212. The theoretical parts of ECS 124 (the lecture part of the course) will have no essential intersection with ANG 212, being either on different material entirely, or being a much more mathematical and algorithmic treatment of the material, i.e., fully explaining and developing the logic of the techniques, rather than focusing on learning to use these techniques in the form of packaged computer programs. A good analogy to explain the partial intersection is that ANG 212 is a course on “flying an airplane” while ECS 124 will be on the “physics of flight” with exercises in flying to make the ideas concrete.

ECS 124 intersects with ECS 221 (taught by I. Tagkopoulos), a graduate seminar course which focuses on Machine Learning methods in Systems and Synthetic Biology. The overlap is mainly on the areas of clustering, classifications and network analysis. ECS 221 assumes advanced computer science knowledge and treats these topics with more mathematical rigor than ECS 124, which includes lectures related to fundamental computer science methods (e.g. suffix trees, dynamic programming, etc.). In contrast to ECS 124, ECS 221 focuses on the presentation of research papers in the field, while it does not requires any laboratories/homeworks.

ECS 124 intersects EVE 298 (taught by M. Sanderson and S. Nadler) in the subarea of phylogenetics. However, the emphasis of the two courses in that overlapping subarea is again different. EVE 298 is oriented towards teaching biology graduate students to use computerized phylogenetic tools effectively in their biological (phylogenetic) research, while ECS 124 will have a more algorithmic and mathematical orientation.

ECS 124 intersects GGG 298D (taught by C. Warden and M. Syvanen) in the subarea of sequence analysis. Again the emphasis is quite different. GGG 298D is oriented towards teaching biology graduate students to use computerized sequence analysis tools effectively in their (biological sequence analysis) research, while ECS 124 will have a more algorithmic and mathematical orientation. There are no undergraduate courses on campus that have any substantial intersection with ECS 124.

Instructors: D. Gusfield and I. Tagkopoulos

History: 2012.10.29 (I. Tagkopoulos): Expansion of syllabus to include genome assembly and current methods in sequence mapping. Addition of a section related to computational methods in Systems Biology (Biological Networks, Machine Learning in Biology). Protein structure prediction methods where removed from the syllabus as there is an overlap with ECS 129. Addition of overlap explanation with ECS 221.  Prior version by D. Gusfield, Oct 2006.




an ability to apply knowledge of mathematics, science, computing, and engineering



an ability to design and conduct experiments, as well as to analyze and interpret data



an ability to design, implement, and evaluate a system, process, component, or program to meet desired needs, within realistic constraints such as economic, environmental, social, political, ethical, health and safety, manufacturability, and sustainability



an ability to function on multi-disciplinary teams



an ability to identify, formulate, and solve computer science and engineering problems  and define the computing requirements appropriate to their solutions



an understanding of professional, ethical, legal, security and social issues and responsibilities



an ability to communicate effectively with a range of audiences



the broad education necessary to understand the impact of computer science and engineering solutions in a global and societal context



a recognition of the need for, and an ability to engage in life-long learning



knowledge of contemporary issues



an ability to use current techniques, skills, and tools necessary for computing and engineering practice



an ability to apply mathematical foundations, algorithmic principles, and computer science and engineering theory in the modeling and design of computer-based systems in a way that demonstrates comprehension of the tradeoffs involved in design choices



an ability to apply design and development principles in the construction of software systems or computer systems of varying complexity