Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

doi:10.1038/srep01895

. 2013:3:1895.

doi: 10.1038/srep01895.

Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

Dong Xu¹, Yang Zhang

Affiliations

PMID: 23719418
PMCID: PMC3667494
DOI: 10.1038/srep01895

Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

Dong Xu et al. Sci Rep. 2013.

. 2013:3:1895.

doi: 10.1038/srep01895.

Authors

Dong Xu¹, Yang Zhang

Affiliation

¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.

PMID: 23719418
PMCID: PMC3667494
DOI: 10.1038/srep01895

Abstract

Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms.

PubMed Disclaimer

Figures

**Figure 1. Distribution of *E. coli* genome sequences.**
(a) Classification of sequences based on their homology to the PDB structures. (b) Histogram of sequence length in different categories.

**Figure 2. Examples of successful QUARK modeling results on known Hard *E. coli* proteins.**
In the structural superposition, QUARK models and experimental structures are shown in green and red cartoons respectively. (a) Superposition of the QUARK model and the experimental structure for ARIR_ECOLI with side-chains of the seven hydrophobic residues highlighted. (b) Solvent accessibility distributions for ARIR_ECOLI with data from sequence-based prediction, QUARK model and experimental structure, respectively. (c) Superposition of the QUARK model and experimental structure for PTHP_ECOLI, where the beta-turns are highlighted in blue in the experimental structure. (d) The four-state secondary structure distribution of PTHP_ECOLI shown for sequence-based prediction, the QUARK model and the experimental structure. Coil, helix, strand and turn are marked in green, red, yellow and blue, respectively. (e) Superposition of the QUARK model and the experimental structure for RSD_ECOLI, which contains 158 residues.

**Figure 3. Histograms of estimated and actual TM-scores.**
The blind set is from 495 unknown *E. coli* hard sequences and the benchmark set consists of 145 non-redundant proteins from the PDB.

**Figure 4. QUARK modeling result for transmembraine protein YQJK_ECOLI in *E. coli*.**
(a) Cartoon representation of the model. Side-chains of residues 35T, 62I and 71L, which mark the location of lipid bilayer, are highlighted in sticks. (b) Predicted secondary structure type and solvent accessibility for the target.

**Figure 5. TM-score of the QUARK models versus TM-score between model and its closest analogy for the benchmark set proteins.**

**Figure 6. SCOP fold family assignment results.**
(a) PTHP_ECOLI; (b) NCPP_ECOLI. The QUARK models, the closest analogy structures in the PDB, and the experimental structure of targets are shown in Columns 1, 2 and 3, respectively. Blue to red runs from N- to C-terminals.

**Figure 7. QUARK models and the closest analogies for four representative *E. coli* hard sequences.**
(a) YFCL_ECOLI; (b) YIJD_ECOLI; (c) YDAF_ECOLI; (d) YDBJ_ECOLI.

**Figure 8. Flowchart of structure modeling and fold family assignment for *E. coli* genome sequences.**

See this image and copyright information in PMC

Cited by

Molecular dynamics simulation reveals insights into the mechanism of unfolding by the A130T/V mutations within the MID1 zinc-binding Bbox1 domain.
Zhao Y, Zeng C, Massiah MA. Zhao Y, et al. PLoS One. 2015 Apr 13;10(4):e0124377. doi: 10.1371/journal.pone.0124377. eCollection 2015. PLoS One. 2015. PMID: 25874572 Free PMC article.
Approaches to ab initio molecular replacement of α-helical transmembrane proteins.
Thomas JMH, Simkovic F, Keegan R, Mayans O, Zhang C, Zhang Y, Rigden DJ. Thomas JMH, et al. Acta Crystallogr D Struct Biol. 2017 Dec 1;73(Pt 12):985-996. doi: 10.1107/S2059798317016436. Epub 2017 Nov 22. Acta Crystallogr D Struct Biol. 2017. PMID: 29199978 Free PMC article.
Highly accurate protein structure prediction for the human proteome.
Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, Bridgland A, Cowie A, Meyer C, Laydon A, Velankar S, Kleywegt GJ, Bateman A, Evans R, Pritzel A, Figurnov M, Ronneberger O, Bates R, Kohl SAA, Potapenko A, Ballard AJ, Romera-Paredes B, Nikolov S, Jain R, Clancy E, Reiman D, Petersen S, Senior AW, Kavukcuoglu K, Birney E, Kohli P, Jumper J, Hassabis D. Tunyasuvunakool K, et al. Nature. 2021 Aug;596(7873):590-596. doi: 10.1038/s41586-021-03828-1. Epub 2021 Jul 22. Nature. 2021. PMID: 34293799 Free PMC article.
Selenoprotein N is an endoplasmic reticulum calcium sensor that links luminal calcium levels to a redox activity.
Chernorudskiy A, Varone E, Colombo SF, Fumagalli S, Cagnotto A, Cattaneo A, Briens M, Baltzinger M, Kuhn L, Bachi A, Berardi A, Salmona M, Musco G, Borgese N, Lescure A, Zito E. Chernorudskiy A, et al. Proc Natl Acad Sci U S A. 2020 Sep 1;117(35):21288-21298. doi: 10.1073/pnas.2003847117. Epub 2020 Aug 17. Proc Natl Acad Sci U S A. 2020. PMID: 32817544 Free PMC article.
A hybrid method for identification of structural domains.
Hua Y, Zhu M, Wang Y, Xie Z, Li M. Hua Y, et al. Sci Rep. 2014 Dec 15;4:7476. doi: 10.1038/srep07476. Sci Rep. 2014. PMID: 25503992 Free PMC article.

See all "Cited by" articles

References

1. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N. & Bourne P. E. The Protein Data Bank. Nucleic Acids Res 28(1), 235–242 (2000). - PMC - PubMed
1. Jensen O. N. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 8(1), 33–41 (2004). - PubMed
1. Levitt M. & Warshel A. Computer-Simulation of Protein Folding. Nature 253(5494), 694–698 (1975). - PubMed
1. Lewis P. N., Momany F. A. & Scheraga H. A. Folding of Polypeptide Chains in Proteins - Proposed Mechanism for Folding. P Natl Acad Sci USA 68(9), 2293–& (1971). - PMC - PubMed
1. Mccammon J. A., Gelin B. R. & Karplus M. Dynamics of Folded Proteins. Nature 267(5612), 585–590 (1977). - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N. & Bourne P. E. The Protein Data Bank. Nucleic Acids Res 28(1), 235–242 (2000). - PMC - PubMed

[2] Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N. & Bourne P. E. The Protein Data Bank. Nucleic Acids Res 28(1), 235–242 (2000). - PMC - PubMed

[3] Jensen O. N. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 8(1), 33–41 (2004). - PubMed

[4] Jensen O. N. Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 8(1), 33–41 (2004). - PubMed

[5] Levitt M. & Warshel A. Computer-Simulation of Protein Folding. Nature 253(5494), 694–698 (1975). - PubMed

[6] Levitt M. & Warshel A. Computer-Simulation of Protein Folding. Nature 253(5494), 694–698 (1975). - PubMed

[7] Lewis P. N., Momany F. A. & Scheraga H. A. Folding of Polypeptide Chains in Proteins - Proposed Mechanism for Folding. P Natl Acad Sci USA 68(9), 2293–& (1971). - PMC - PubMed

[8] Lewis P. N., Momany F. A. & Scheraga H. A. Folding of Polypeptide Chains in Proteins - Proposed Mechanism for Folding. P Natl Acad Sci USA 68(9), 2293–& (1971). - PMC - PubMed

[9] Mccammon J. A., Gelin B. R. & Karplus M. Dynamics of Folded Proteins. Nature 267(5612), 585–590 (1977). - PubMed

[10] Mccammon J. A., Gelin B. R. & Karplus M. Dynamics of Folded Proteins. Nature 267(5612), 585–590 (1977). - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

Affiliation

Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources