Additional software capabilities needed

Next: Bibliography Up: Appendix: Details of a Previous: Additional Details from Jim

Additional software capabilities needed

In addition to the above list, additional software capabilities have been suggested along the lines of software available at the Pittsburgh Supercomputing Center.

Among the computational tasks that might be expected of a university-wide computation system for molecular biology are:

1. Data base search. Search any of several data bases for a known sequence, search the headers for all proteins of a certain classification,

Data bases to be held locally:

A. Genbank B. dbEST, dbSTS C. PDB, NDB D. Swiss Protein

ALSO PIR (much larger (more sequences) than swiss protein)

E. Signaling pathway database.

F. Any of several genomes, including cattle, chicken, pig,

Human, Drosophila, Arabidopsis, Maize, Soybean, Rice, ...

2. Homology detection, sequence similarities. PSC has NWGap, MaxSegs, FShift (FrameShift), FASTA Use BLAST server when appropriate

3. Homology detection, 3D similarities. PSC has Profiless (3d profiles), 1dto3d (aligns sequence to structure)

4. Contig assembly. Whatever is in GCG

5. Identify open reading frames, exons, and introns. Whatever is in GCG - this is probably better done using web resources specialized to genes in specific classes of organisms - maybe set one up for plant genes at Davis.

6. Construct phylogenetic trees. Phylip, PAUP in GCG

7. Predict RNA secondary structure. MFold in GCG, Jake Maizel's RanFold, trying to get Bruce Shapiro's Genetic Algorithm for RNA folding

8. Predict protein secondary structure. Whatever is in GCG - again specialized servers on the web are probably better and should be used

9. Alignment of two sequences (DNA or protein). NWGap, MaxSegs, FShift

10. Alignment of multiple sequences (DNA or protein). MSA (need large memory machine to be worthwhile), CLustalW, SAGA, Hummer package

11. Dot plot of two sequences. GCG

12. Molecular visualization and computational chemistry. A. Both static and dynamic molecular structural computations using CHARMm or AMBER or an equivalent system.

B. Display protein tertiary structure (ball and stick, ribbon, etc.). Graphx, MolMol, RasMol (both the original and Berkley version)

C. Display RNA secondary structure. GCG

D. Display DNA/RNA tertiary structure (see http://synapse.lanl.gov) Graphx

E. Graphical display of orf, exon, intron location. GCG programs 13. Deposit data in data banks.

14. Determination of common motifs. MEME, Consensus

15. Search for known motifs (promoter, splice site, antigenic peptide, etc.) searchprosite, searchblocks, mast, profiless, hmmPfam, GCG Motifs program

16. Primer design. GCG

17. Nucleotide to protein translation. GCG

18. Statistical analysis (a bit of a broad category).

19. A Web server for: Maize genome, Soybean genome, Swine genome

Also, the support staff should be able to advise and help in the informatics needs of small genome programs, such as the database and web needs that arise in UCD project to do expression monitoring in the cotton genome.

Next: Bibliography Up: Appendix: Details of a Previous: Additional Details from Jim

Dan Gusfield
1999-11-03