Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 24:12:155.
doi: 10.1186/1471-2148-12-155.

Dissecting the role of low-complexity regions in the evolution of vertebrate proteins

Affiliations

Dissecting the role of low-complexity regions in the evolution of vertebrate proteins

Núria Radó-Trilla et al. BMC Evol Biol. .

Abstract

Background: Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution.

Results: We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance.

Conclusion: We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Conservation of low-complexity regions (LCRs) in chordate homologous protein families. LCRs are indicated in red. Only the region of the alignment containing the LCR is shown. a) Example of conserved chordate LCR enriched in lysines. Corresponds to methionyl aminopeptidase 2 (Ensembl Protein Identifier ENSP00000325312 in humans). b) Example of mammalian-specific LCR enriched in alanines. Corresponds to alkylation repair 5 (Ensembl Protein Identifier ENSP00000261650 in humans).
Figure 2
Figure 2
Relative abundance of low-complexity regions (LCRs) at different phylogenetic depths. The area of the circles is proportional to LCR relative frequencies. The number of LCRs at each branch is indicated. In black, data for all LCRs. In blue, data for LCRs enriched in positively charged amino acids (K and R). In red, data for LCRs enriched in alanine (A), glycine (G) or both. The LCR phylogenetic distribution of LCRs labeled in blue, and of LCRs labeled in red, deviated significantly from the expected one considering all LCRs (Fisher’s exact test p < 10-5).
Figure 3
Figure 3
Observed versus expected number of LCRs in the different branches leading to an extant organism. Data is shown for LCRs enriched in K or R (in blue) and in A, G or AG (in red). The observed distribution of LCRs labeled in blue deviated significantly from the expected one in all three cases (Fisher’s exact test p < 10-3). The observed distribution of LCRs labeled in red deviated significantly from the expected one for human LCRs (Fisher’s exact test p < 10-3).
Figure 4
Figure 4
Dynamics of gain and loss of low-complexity regions (LCRs) in vertebrate homologous protein families. The LCRs gained in each branch correspond to LCRs observed at different phylogenetic depths. The LCRs lost are estimated from LCR phylogenetic distribution data (see Methods). Values in square brackets in internal nodes represent the estimated number of ancestral LCRs.

Similar articles

Cited by

References

    1. Golding GB. Simple sequence is abundant in eukaryotic proteins. Protein Sci. 1999;8(6):1358–1361. doi: 10.1110/ps.8.6.1358. - DOI - PMC - PubMed
    1. Wootton JC, Federhen S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 1996;266:554–571. - PubMed
    1. Green H, Wang N. Codon reiteration and the evolution of proteins. Proc Natl Acad Sci U S A. 1994;91(10):4298–4302. doi: 10.1073/pnas.91.10.4298. - DOI - PMC - PubMed
    1. Karlin S, Brocchieri L, Bergman A, Mrazek J, Gentles AJ. Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci U S A. 2002;99(1):333–338. doi: 10.1073/pnas.012608599. - DOI - PMC - PubMed
    1. Alba MM, Guigo R. Comparative analysis of amino acid repeats in rodents and humans. Genome Res. 2004;14(4):549–554. doi: 10.1101/gr.1925704. - DOI - PMC - PubMed

Publication types

LinkOut - more resources