Forbidden penta-peptides

doi:10.1110/ps.073067607

. 2007 Oct;16(10):2251-9.

doi: 10.1110/ps.073067607.

Forbidden penta-peptides

Tamir Tuller¹, Benny Chor, Nathan Nelson

Affiliations

PMID: 17893362
PMCID: PMC2204130
DOI: 10.1110/ps.073067607

Forbidden penta-peptides

Tamir Tuller et al. Protein Sci. 2007 Oct.

. 2007 Oct;16(10):2251-9.

doi: 10.1110/ps.073067607.

Authors

Tamir Tuller¹, Benny Chor, Nathan Nelson

Affiliation

¹ School of Computer Science, Tel Aviv University, Tel Aviv, Israel. tamirtul@post.tau.ac.il

PMID: 17893362
PMCID: PMC2204130
DOI: 10.1110/ps.073067607

Abstract

There are 3,200,000 amino acid sequences of length 5 (penta-peptides). Statistically, we expect to see a distribution of penta-peptides that is determined by the frequency of the participating amino acids. We show, however, that not only are there thousands of such penta-peptides that are absent from all known proteomes, but many of them are coded for multiple times in the non-coding genomic regions. This suggests a strong selection process that prevents these peptides from being expressed. We also show that the characteristics of these forbidden penta-peptides vary among different phylogenetic groups (e.g., eukaryotes, prokaryotes, and archaea). Our analysis provides the first steps toward understanding the "grammar" of the forbidden penta-peptides.

PubMed Disclaimer

Figures

**Figure 1.**
(A) Distributions of missing and forbidden penta-peptides with respect to phylogenetic groups. For each group we present the total number of organisms, the total length of the proteomes, the number of missing penta-peptides, the number of forbidden penta-peptides, and the number of forbidden penta-peptides that are present in a sibling phylogentic group. (B) Distributions of forbidden penta-peptides for each phylogenetic group, split according to three ranges of log P-values (nine major squares). The partition of each major square to nine subsquares (a subsquare for each of the phylogenetic groups) is according to the legend on the *left*. (C) A tree that describes the nine phylogenetic groups. Two phylogenetic groups with a common ancestor in the tree are called “sibling groups” (e.g., the Insects and the Vertebrates). The group “non-mammals” includes the organisms in our data set that are vertebrate but not mammals (two fish and a bird), so it is not monophyletic.

**Figure 2.**
(A) Comparing the distribution of amino acids in forbidden penta-peptides and the background distributions in all proteomes in bacteria, archea, and eukaryotes. The square nodes denote the distributions of amino acids in the forbidden penta-peptides. The circular nodes denote the background distributions of amino acids in all proteins. The numbers on each edge denote the symmetric Kullback-Leibler distances (Cover and Thomas 1991) between the amino acid distributions across the edge. Chi-square test of all the distributions (background and in forbidden penta-peptides) gives very significant results, all log P-values >−16. This supports the hypothesis that all the distributions are different. (B) The distribution of amino acid in the penta-peptides that are forbidden with respect to bacteria, archaea, and eukaryotes.

**Figure 3.**
Amino acid distribution in penta-peptides forbidden with respect to large phylogenetic subgroups. The numbers on the solid edges denote the symmetric Kullback-Leibler distances (Cover and Thomas 1991) between the amino acid distributions across the edge; the dashed lines denote the symmetric Kullback-Leibler distances between the background distribution of amino acids. Chi-square test of all the distributions (background and in forbidden penta-peptides) gives very significant results, all log P-values >−16. This supports the hypothesis that all the distributions are different. The estimated time from the divergence of each pair of sibling groups appears on the *right*.

See this image and copyright information in PMC

Cited by

Predicting nucleosome binding motif set and analyzing their distributions around functional sites of human genes.
Bao T, Li H, Zhao X, Liu G. Bao T, et al. Chromosome Res. 2012 Aug;20(6):685-98. doi: 10.1007/s10577-012-9305-0. Epub 2012 Jul 31. Chromosome Res. 2012. PMID: 22847645
Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.
Navon SP, Kornberg G, Chen J, Schwartzman T, Tsai A, Puglisi EV, Puglisi JD, Adir N. Navon SP, et al. Proc Natl Acad Sci U S A. 2016 Jun 28;113(26):7166-70. doi: 10.1073/pnas.1606518113. Epub 2016 Jun 15. Proc Natl Acad Sci U S A. 2016. PMID: 27307442 Free PMC article.
Genomic DNA k-mer spectra: models and modalities.
Chor B, Horn D, Goldman N, Levy Y, Massingham T. Chor B, et al. Genome Biol. 2009;10(10):R108. doi: 10.1186/gb-2009-10-10-r108. Epub 2009 Oct 8. Genome Biol. 2009. PMID: 19814784 Free PMC article.
Computational analysis of nascent peptides that induce ribosome stalling and their proteomic distribution in Saccharomyces cerevisiae.
Sabi R, Tuller T. Sabi R, et al. RNA. 2017 Jul;23(7):983-994. doi: 10.1261/rna.059188.116. Epub 2017 Mar 31. RNA. 2017. PMID: 28363900 Free PMC article.
The determinants of the rarity of nucleic and peptide short sequences in nature.
Chantzi N, Mareboina M, Konnaris MA, Montgomery A, Patsakis M, Mouratidis I, Georgakopoulos-Soares I. Chantzi N, et al. NAR Genom Bioinform. 2024 Apr 4;6(2):lqae029. doi: 10.1093/nargab/lqae029. eCollection 2024 Jun. NAR Genom Bioinform. 2024. PMID: 38584871 Free PMC article.

See all "Cited by" articles

References

1. Abe N. and Mamitsuka, H. 1997. Predicting protein secondary structure using stochastic tree grammars. J Mach Learn 29: 275–301.
1. Alberts B., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, P. 2002. Molecular biology of the cell, 4th ed. Garland, New York.
1. Benjamini Y. and Yekutieli, D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29: 1165–1188.
1. Blaber M., Zhang, X.J., and Matthews, B.W. 1993. Structural basis of amino acid alpha helix propensity. Science 260: 1637–1640. - PubMed
1. Bystroff C., Thorsson, V., and Baker, D. 2000. HMMSTR: A hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301: 173–190. - PubMed

Publication types

Actions

MeSH terms

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

[1] Abe N. and Mamitsuka, H. 1997. Predicting protein secondary structure using stochastic tree grammars. J Mach Learn 29: 275–301.

[2] Abe N. and Mamitsuka, H. 1997. Predicting protein secondary structure using stochastic tree grammars. J Mach Learn 29: 275–301.

[3] Alberts B., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, P. 2002. Molecular biology of the cell, 4th ed. Garland, New York.

[4] Alberts B., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, P. 2002. Molecular biology of the cell, 4th ed. Garland, New York.

[5] Benjamini Y. and Yekutieli, D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29: 1165–1188.

[6] Benjamini Y. and Yekutieli, D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29: 1165–1188.

[7] Blaber M., Zhang, X.J., and Matthews, B.W. 1993. Structural basis of amino acid alpha helix propensity. Science 260: 1637–1640. - PubMed

[8] Blaber M., Zhang, X.J., and Matthews, B.W. 1993. Structural basis of amino acid alpha helix propensity. Science 260: 1637–1640. - PubMed

[9] Bystroff C., Thorsson, V., and Baker, D. 2000. HMMSTR: A hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301: 173–190. - PubMed

[10] Bystroff C., Thorsson, V., and Baker, D. 2000. HMMSTR: A hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301: 173–190. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Forbidden penta-peptides

Affiliation

Forbidden penta-peptides

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources