Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013:3:1895.
doi: 10.1038/srep01895.

Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

Affiliations

Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

Dong Xu et al. Sci Rep. 2013.

Abstract

Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Distribution of E. coli genome sequences.
(a) Classification of sequences based on their homology to the PDB structures. (b) Histogram of sequence length in different categories.
Figure 2
Figure 2. Examples of successful QUARK modeling results on known Hard E. coli proteins.
In the structural superposition, QUARK models and experimental structures are shown in green and red cartoons respectively. (a) Superposition of the QUARK model and the experimental structure for ARIR_ECOLI with side-chains of the seven hydrophobic residues highlighted. (b) Solvent accessibility distributions for ARIR_ECOLI with data from sequence-based prediction, QUARK model and experimental structure, respectively. (c) Superposition of the QUARK model and experimental structure for PTHP_ECOLI, where the beta-turns are highlighted in blue in the experimental structure. (d) The four-state secondary structure distribution of PTHP_ECOLI shown for sequence-based prediction, the QUARK model and the experimental structure. Coil, helix, strand and turn are marked in green, red, yellow and blue, respectively. (e) Superposition of the QUARK model and the experimental structure for RSD_ECOLI, which contains 158 residues.
Figure 3
Figure 3. Histograms of estimated and actual TM-scores.
The blind set is from 495 unknown E. coli hard sequences and the benchmark set consists of 145 non-redundant proteins from the PDB.
Figure 4
Figure 4. QUARK modeling result for transmembraine protein YQJK_ECOLI in E. coli.
(a) Cartoon representation of the model. Side-chains of residues 35T, 62I and 71L, which mark the location of lipid bilayer, are highlighted in sticks. (b) Predicted secondary structure type and solvent accessibility for the target.
Figure 5
Figure 5. TM-score of the QUARK models versus TM-score between model and its closest analogy for the benchmark set proteins.
Figure 6
Figure 6. SCOP fold family assignment results.
(a) PTHP_ECOLI; (b) NCPP_ECOLI. The QUARK models, the closest analogy structures in the PDB, and the experimental structure of targets are shown in Columns 1, 2 and 3, respectively. Blue to red runs from N- to C-terminals.
Figure 7
Figure 7. QUARK models and the closest analogies for four representative E. coli hard sequences.
(a) YFCL_ECOLI; (b) YIJD_ECOLI; (c) YDAF_ECOLI; (d) YDBJ_ECOLI.
Figure 8
Figure 8. Flowchart of structure modeling and fold family assignment for E. coli genome sequences.

Similar articles

Cited by

References

    1. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N. & Bourne P. E. The Protein Data Bank. Nucleic Acids Res 28(1), 235–242 (2000). - PMC - PubMed
    1. Jensen O. N. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 8(1), 33–41 (2004). - PubMed
    1. Levitt M. & Warshel A. Computer-Simulation of Protein Folding. Nature 253(5494), 694–698 (1975). - PubMed
    1. Lewis P. N., Momany F. A. & Scheraga H. A. Folding of Polypeptide Chains in Proteins - Proposed Mechanism for Folding. P Natl Acad Sci USA 68(9), 2293–& (1971). - PMC - PubMed
    1. Mccammon J. A., Gelin B. R. & Karplus M. Dynamics of Folded Proteins. Nature 267(5612), 585–590 (1977). - PubMed

Publication types

Substances

LinkOut - more resources