One of the grand challenges in computational biology is the prediction of the three-dimensional structure of a protein from its chemical makeup alone. A protein's primary structure, i.e., its amino acid sequence, is directly encoded in its DNA sequence; this, however, is a purely one-dimensional structure that does not directly encode a three-dimensional shape. It is commonly believed that the native shape of a protein is the one corresponding to the global minimum of its internal energy; and the protein folding problem has been treated as an optimization problem in recent years, to some success. Since the optimization problem is high-dimensional and the energy function contains local extrema in abundance, it is important to provide an optimization program with a diverse set of chemically and biologically reasonable initial configurations. When using human intuition and biological knowledge to create initial configurations it is highly likely that much better predictions can be obtained in much less computing time. Our work focuses on providing an interactive, visual tool assisting a user to rapidly create many three-dimensional protein structures for a given amino acid sequence. These structures are then used as initial configurations for an optimization algorithm.
ProtoShop is an interactive visualization program for 3D protein structures, but it focuses on interactive modelling methods to create reasonable initial configurations for global energy optimization. It uses a direct manipulation interface, allowing users to select arbitrary substructures (alpha-helices, beta-strands) of a protein and drag them to create tertiary structures such as sheets of aligned beta-strands or clusters of alpha-helices. The program uses an inverse kinematics approach to translate the user's dragging motions into dihedral angle changes along amorphous coil regions connecting a selected substructure to the rest of the protein. In other words, the program exploits the inherent degrees of freedom present in a chain of amino acid residues - two angles of rotation per residue - to move substructures without violating local chemical invariants such as bond distances or bond angles. ProtoShop offers several special-purpose visualization methods to aid users in creating good protein structures, such as real-time detection and visualization of hydrogen bonds forming between beta-strands in a beta-sheet, visualization of potential sites for hydrogen bonds, and real-time detection and visualization of interference between atoms. Additionally, the program can visualize the computed internal energy of the current protein structure in near real time, and visualize it using a 3D volume rendering method. This aids a user in detecting which parts of a protein still exhibit high energies and need to be improved.
Figure 1: Cartoon rendering of a beta barrel protein structure created from scratch entirely by interactive manipulation using ProtoShop. A movie of this protein being assembled one amino acid residue at a time, and then being folded into the beta barrel shape, is available for download (MPEG-1 format, 4,283KB).
Figure 2: Video of a user manipulating the protein shown in Figure 1 in a CAVE VR environment.
ProtoShop is mostly used in two stages of the protein structure prediction process. Its main use is during the preparation phase, where it is used to create reasonable initial 3D models starting from a 1D amino acid chain and a prediction of substructure types (performed by an external pattern-matching process). Another valuable application of the software is as a monitoring and steering tool during the global energy optimization process itself. This process is computationally very intensive; it requires days to weeks of runtime using hundreds of CPUs on large parallel supercomputers. ProtoShop contains a remote communications module that can connect to a running optimization process and query the current optimization tree and all individual candidate structures. Users can then explore these structures, and even manipulate them manually and re-submit them to the optimization process. It turned out that this ability to steer the optimization - which previously was an unattended batch process - can drastically improve the quality of the final results while at the same time reducing computation time. Branches of computation that would lead to "dead ends" or unreasonable proteins can be culled early, and structures that seem to be stuck in local minima can manually be nudged into a more favorable state. The mechanisms of protein folding are not yet well-understood, and a biologist's knowledge and intuition is a valuable complement to a fully automatic optimization process.
Project Goals
The main project goals were to implement a protein manipulation program with the following functionality:
Create a 3D protein structure "from scratch," i.e., from an amino acid residue sequence and secondary structure prediction.
Load a 3D protein structure from data files in PDB (Protein Data Bank) file format.
Visualize 3D protein structures using several "industry-standard" visualization techniques (see screen shot section).
Atom sphere rendering
Bond stick rendering
Cartoon rendering
Interactively manipulate a 3D protein structure by changing dihedral angles along its backbone.
Global angle changes in selected secondary structures (twisting/curling/pleating of beta strands).
Manual secondary structure alignment by dragging selected secondary structures. Transformations of selected structures are realized by updating backbone dihedral angles in selected amorphous "coil regions" using an inverse kinematics (IK) approach.
Semi-automatic formation of parallel/anti-parallel hydrogren bonds between beta strands using inverse kinematics.
Save/load backbone dihedral angle sequences to rapidly create permutations of tertiary structure alignments.
Interactively change secondary structure types for selected amino acid residues to experiment with low-confidence predictions.
Visualize guides and markers to help aligning secondary structures inside a protein to form tertiary structure.
Hydrogen bond rendering
Hydrogen bond site rendering
Atom collision visualization
Save created and/or manipulated 3D protein structures to data files in PDB format.
Evaluate the internal energy of a 3D protein structure using externally provided energy computation code.
Visualize calculated internal energy.
Monitor/steer a (remote) global optimization process.
Connect to/disconnect from optimization process at any time.
Download candidate configurations from optimization process.
Upload manipulated configurations to optimization process.
Monitor optimization process by downloading entire tree of candidate configurations previously considered.
Project Status
ProtoShop has reached the level of functionality described in above section, and is currently undergoing thorough testing by our collaborators, being used to evaluate the results from last summer's CASP5 protein structure prediction conference, and to debug/improve the used global optimization code. Since many researchers in the protein structure prediction community expressed interest to use ProtoShop in their own research, we are currently discussing ways to make ProtoShop available to the community either as a binary-only release, or, preferrably, under an appropriate source code release model.
News:ProteinShop (as it is now officially called - I still like the original name better) is now an Open Source software project hosted on SourceForge. It seems all the infighting finally paid off!
In other news, an article about ProteinShop has appeared in Lawrence Berkeley Lab's Science Beat e-zine. The article is very nicely written, check it out! It has already made its way around the web, appearing as well in Supercomputing Online and The Scientist (cannot put link to article here, since the registration form at the web site has an error).
<BRAGGING>In yet other news, our Visualization 2003 paper (see citation below) won "Best Application Award!"</BRAGGING> The presentation I gave at the Vis conference in Seattle is available for download (PDF format, 2,132KB).
Related Publications
Crivelli, S.N., Kreylos, O., Hamann, B., Max, N., Bethel, E.W., ProteinShop: A Tool for Interactive Protein Manipulation and Steering, accepted for publication in: Journal of Computer Aided Molecular Design (JCAMD)
Kreylos, O., Max, N., Hamann, B., Crivelli, S.N. and Bethel, E.W., Interactive Protein Manipulation, to be presented at IEEE Visualization conference 2003
Available for download (PDF format, 1,494KB)
Kreylos, O., Max, N., and Crivelli, S., ProtoShop: Interactive Design of Protein Structures, in: Moult, J., Fidelis, K., Zemla, A, and Hubbard, T., eds., Proceedings of CASP5 - Fifth Meeting on the Critical Assessment of Techniques for Protein Structure Prediction, Pacific Grove, California, December 1-5, 2002, pp. A213-A214
Head-Gordon, T., Crivelli, S., Kreylos, O., Eskow, B., Choi, H., Byrd, R., and Schnabel, R., A Physical Approach to Protein Structure Prediction, in: Moult, J., Fidelis, K., Zemla, A, and Hubbard, T., eds., Proceedings of CASP5 - Fifth Meeting on the Critical Assessment of Techniques for Protein Structure Prediction, Pacific Grove, California, December 1-5, 2002, pp. A76-A78
Kreylos, O., Hamann, B., Max, N.L., Bethel, E.W., and Crivelli, S.N., Interactive Protein Manipulation, in: Nuckolls, G., ed., Proceedings of the 2002 UC Davis Student Workshop on Computing, TR CSE-2002-28 (presented at: "2002 UC Davis Student Workshop on Computing," University of California, Davis, California, October 2002)
Available for download (PDF format, 550KB)