Additional Details from Jim Cornette at ISU

Next: Additional software capabilities needed Up: Appendix: Details of a Previous: Overview from Dan Voytas

Additional Details from Jim Cornette at ISU

One of the hard things I have worked on this Fall is trying to get an understanding of the computer needs of our community. I am of the opinion that there should be a system freely available to all on campus that has the characteristics shown at the end of this note. It is easier to provide centrally for computational genomics than for computational molecular structure. Designing the hardware is very difficult in either case. We have (one of our faculty, Alan Myers, has) a DEC that runs GCG as a central machine available upon request from a faculty member. We proposed to move that to the computation center and have it centrally maintained and expand it to two networked remote-server computers and have SeqWeb interface (a web interface restricted to campus but accessible from computers on campus that can open a unix window). You can get a lot of information from the GCG web site. Then, we proposed getting a couple of SGI's with Molecular Simulations, Inc software for computational chemistry. These would be available to all - several labs have computational chemistry machines available only to their labs.

Central to our proposal was (1) maintenance of hardware by the Computation Center, and two postdocs half-time assigned to maintain and teach how to use software. We decided on postdocs because we would have a lot of difficulty attracting and retaining regular employees in this area.

We have applied for an IGERT grant that Dan V will tell you about, and it is designed for RA support (four new students/year for the next five years - I think the university has some obligation to continue some of those beyond the five years), and 5K/student that can be used by the student to buy his or her own PC (or notebook).

We also have RA support from Pioneer Hi-Bred, International (a seed company) of 100K/year for the next five years, designed to support five students per year (17K salary plus 3K benefits, travel, supplies) and may get support from a software companies SBIR grant to NIH for three students per year for two years. We are not seeing very much RA from the university. There has been a suggestion of perhaps four per year to a rather broad informatics category for which we could compete.

I have no feel for space required. Probably an office and a computer lab will do it. On the other hand, some of our students are, and will be, doing some wet lab work, and I am out of that loop.

Among the computational tasks that might be expected of a university-wide computation system for molecular biology are:

1. Data base search. Search any of several data bases for a known sequence, search the headers for all proteins of a certain classification,

Data bases to be held locally:

A. Genbank B. dbEST, dbSTS C. PDB, NDB D. Swiss Protein

E. Signaling pathway database.

F. Any of several genomes, including cattle, chicken, pig,

Human, Drosophila, Arabidopsis, Maize, Soybean, Rice, ...

2. Homology detection, sequence similarities.

3. Homology detection, 3D similarities.

4. Contig assembly.

5. Identify open reading frames, exons, and introns.

6. Construct phylogenetic trees.

7. Predict RNA secondary structure.

8. Predict protein secondary structure.

9. Alignment of two sequences (DNA or protein).

10. Alignment of multiple sequences (DNA or protein).

11. Dot plot of two sequences.

12. Molecular visualization and computational chemistry. A. Both static and dynamic molecular structural computations using CHARMm or AMBER or an equivalent system.

B. Display protein tertiary structure (ball and stick, ribbon, etc.).

C. Display RNA secondary structure.

D. Display DNA/RNA tertiary structure (see http://synapse.lanl.gov)

E. Graphical display of orf, exon, intron location. 13. Deposit data in data banks.

14. Determination of common motifs.

15. Search for known motifs (promoter, splice site, antigenic peptide, etc.)

16. Primer design.

17. Nucleotide to protein translation.

18. Statistical analysis (a bit of a broad category).

19. A Web server for: Maize genome, Soybean genome, Swine genome

Next: Additional software capabilities needed Up: Appendix: Details of a Previous: Overview from Dan Voytas

Dan Gusfield
1999-11-03