Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats
- PMID: 16618963
- DOI: 10.1093/molbev/msk022
Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats
Abstract
Mutations causing expansion of amino acid repeats are responsible for 19 hereditary disorders. Repeats in several other proteins also show length variations. These observations prompted us to identify single amino acid repeat-containing proteins (SARPs) in humans and to understand their functional and evolutionary significance. We identified 8812 SARPs containing 17 146 repeat domains, each harboring 4 or more residues. In all, 5% of SARPs (471) showed repeat length variations, and nearly 84% of them (394) have repeats of 10 residues or less. We find that SARPs are involved in functions that require formation of multiprotein complexes. Nearly 78% (6859) of the SARPs did not find a paralogue in the human proteome, and such proteins are considered as orphan SARPs. Orphan SARPs show longer repeat stretches, longer peptide length, and lower expression levels as compared with SARPs belonging to protein family. Because the intensity of gene expression is known to relate inversely with the rate of protein sequence evolution, our results suggest that the orphan SARPs evolve faster than the familial forms and therefore are under a weaker selection pressure. We also find that while GC-rich codons are favored for coding the repeat tracts of SARPs, specific codons and not nucleotide motifs per se are selected, suggesting functional constraints placed on the usage of codons. One of the constraints could be the mRNA stability as clustering of rare codons is known to destabilize the transcripts and rare codons are not favored for coding repeat tracts. Genes encoding polymorphic SARPs show preferential localization toward the telomeric segments. Further, the sex-specific recombination rates of the chromosomal locus strongly correlate with the parental gender that influence the repeat instability in disorder caused by dynamic mutation. Therefore, instability associated with repeats might be driven by processes that are specific to sperm or oocyte development, and the recombination frequency might play a positive role in this process.
Similar articles
-
Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains.Hum Mol Genet. 2003 Nov 15;12(22):2967-79. doi: 10.1093/hmg/ddg329. Epub 2003 Sep 30. Hum Mol Genet. 2003. PMID: 14519685
-
Simple sequence repeats in proteins and their significance for network evolution.Gene. 2005 Jan 17;345(1):113-8. doi: 10.1016/j.gene.2004.11.023. Epub 2004 Dec 15. Gene. 2005. PMID: 15716087 Review.
-
Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats.Genomics. 2007 Mar;89(3):316-25. doi: 10.1016/j.ygeno.2006.11.011. Epub 2006 Dec 28. Genomics. 2007. PMID: 17196365
-
Length variation of CAG/CAA triplet repeats in 50 genes among 16 inbred mouse strains.Gene. 2005 Apr 11;349:107-19. doi: 10.1016/j.gene.2004.11.050. Gene. 2005. PMID: 15777662
-
Comparison of ARM and HEAT protein repeats.J Mol Biol. 2001 May 25;309(1):1-18. doi: 10.1006/jmbi.2001.4624. J Mol Biol. 2001. PMID: 11491282 Review.
Cited by
-
Adaptive genetic markers discriminate migratory runs of Chinook salmon (Oncorhynchus tshawytscha) amid continued gene flow.Evol Appl. 2013 Dec;6(8):1184-94. doi: 10.1111/eva.12095. Epub 2013 Sep 10. Evol Appl. 2013. PMID: 24478800 Free PMC article.
-
Role of everlasting triplet expansions in protein evolution.J Mol Evol. 2011 Feb;72(2):232-9. doi: 10.1007/s00239-010-9425-0. Epub 2010 Dec 16. J Mol Evol. 2011. PMID: 21161200
-
Overexpression of a homopeptide repeat-containing bHLH protein gene (OrbHLH001) from Dongxiang Wild Rice confers freezing and salt tolerance in transgenic Arabidopsis.Plant Cell Rep. 2010 Sep;29(9):977-86. doi: 10.1007/s00299-010-0883-z. Epub 2010 Jun 18. Plant Cell Rep. 2010. PMID: 20559833
-
Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved.Nucleic Acids Res. 2019 Nov 4;47(19):9998-10009. doi: 10.1093/nar/gkz730. Nucleic Acids Res. 2019. PMID: 31504783 Free PMC article.
-
Variable numbers of tandem repeats in Plasmodium falciparum genes.J Mol Evol. 2010 Oct;71(4):268-78. doi: 10.1007/s00239-010-9381-8. Epub 2010 Aug 22. J Mol Evol. 2010. PMID: 20730584 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous