Skip to main page content
U.S. flag

An official website of the United States government

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct;16(10):2251-9.
doi: 10.1110/ps.073067607.

Forbidden penta-peptides

Affiliations

Forbidden penta-peptides

Tamir Tuller et al. Protein Sci. 2007 Oct.

Abstract

There are 3,200,000 amino acid sequences of length 5 (penta-peptides). Statistically, we expect to see a distribution of penta-peptides that is determined by the frequency of the participating amino acids. We show, however, that not only are there thousands of such penta-peptides that are absent from all known proteomes, but many of them are coded for multiple times in the non-coding genomic regions. This suggests a strong selection process that prevents these peptides from being expressed. We also show that the characteristics of these forbidden penta-peptides vary among different phylogenetic groups (e.g., eukaryotes, prokaryotes, and archaea). Our analysis provides the first steps toward understanding the "grammar" of the forbidden penta-peptides.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Distributions of missing and forbidden penta-peptides with respect to phylogenetic groups. For each group we present the total number of organisms, the total length of the proteomes, the number of missing penta-peptides, the number of forbidden penta-peptides, and the number of forbidden penta-peptides that are present in a sibling phylogentic group. (B) Distributions of forbidden penta-peptides for each phylogenetic group, split according to three ranges of log P-values (nine major squares). The partition of each major square to nine subsquares (a subsquare for each of the phylogenetic groups) is according to the legend on the left. (C) A tree that describes the nine phylogenetic groups. Two phylogenetic groups with a common ancestor in the tree are called “sibling groups” (e.g., the Insects and the Vertebrates). The group “non-mammals” includes the organisms in our data set that are vertebrate but not mammals (two fish and a bird), so it is not monophyletic.
Figure 2.
Figure 2.
(A) Comparing the distribution of amino acids in forbidden penta-peptides and the background distributions in all proteomes in bacteria, archea, and eukaryotes. The square nodes denote the distributions of amino acids in the forbidden penta-peptides. The circular nodes denote the background distributions of amino acids in all proteins. The numbers on each edge denote the symmetric Kullback-Leibler distances (Cover and Thomas 1991) between the amino acid distributions across the edge. Chi-square test of all the distributions (background and in forbidden penta-peptides) gives very significant results, all log P-values >−16. This supports the hypothesis that all the distributions are different. (B) The distribution of amino acid in the penta-peptides that are forbidden with respect to bacteria, archaea, and eukaryotes.
Figure 3.
Figure 3.
Amino acid distribution in penta-peptides forbidden with respect to large phylogenetic subgroups. The numbers on the solid edges denote the symmetric Kullback-Leibler distances (Cover and Thomas 1991) between the amino acid distributions across the edge; the dashed lines denote the symmetric Kullback-Leibler distances between the background distribution of amino acids. Chi-square test of all the distributions (background and in forbidden penta-peptides) gives very significant results, all log P-values >−16. This supports the hypothesis that all the distributions are different. The estimated time from the divergence of each pair of sibling groups appears on the right.

Similar articles

Cited by

References

    1. Abe N. and Mamitsuka, H. 1997. Predicting protein secondary structure using stochastic tree grammars. J Mach Learn 29: 275–301.
    1. Alberts B., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, P. 2002. Molecular biology of the cell, 4th ed. Garland, New York.
    1. Benjamini Y. and Yekutieli, D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29: 1165–1188.
    1. Blaber M., Zhang, X.J., and Matthews, B.W. 1993. Structural basis of amino acid alpha helix propensity. Science 260: 1637–1640. - PubMed
    1. Bystroff C., Thorsson, V., and Baker, D. 2000. HMMSTR: A hidden Markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301: 173–190. - PubMed

Publication types

LinkOut - more resources