Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Aug;18(8):1791-802.
doi: 10.1105/tpc.106.041905. Epub 2006 Jul 7.

High rate of chimeric gene origination by retroposition in plant genomes

Affiliations

High rate of chimeric gene origination by retroposition in plant genomes

Wen Wang et al. Plant Cell. 2006 Aug.

Abstract

Retroposition is widely found to play essential roles in origination of new mammalian and other animal genes. However, the scarcity of retrogenes in plants has led to the assumption that plant genomes rarely evolve new gene duplicates by retroposition, despite abundant retrotransposons in plants and a reported long terminal repeat (LTR) retrotransposon-mediated mechanism of retroposing cellular genes in maize (Zea mays). We show extensive retropositions in the rice (Oryza sativa) genome, with 1235 identified primary retrogenes. We identified 27 of these primary retrogenes within LTR retrotransposons, confirming a previously observed role of retroelements in generating plant retrogenes. Substitution analyses revealed that the vast majority are subject to negative selection, suggesting, along with expression data and evidence of age, that they are likely functional retrogenes. In addition, 42% of these retrosequences have recruited new exons from flanking regions, generating a large number of chimerical genes. We also identified young chimerical genes, suggesting that gene origination through retroposition is ongoing, with a rate an order of magnitude higher than the rate in primates. Finally, we observed that retropositions have followed an unexpected spatial pattern in which functional retrogenes avoid centromeric regions, while retropseudogenes are randomly distributed. These observations suggest that retroposition is an important mechanism that governs gene evolution in rice and other grass species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Formation and Example of Retrogenes. (A) General model for formation of chimerical structure of retrogenes. Orange boxes, parental and retrogene CDS regions; light-blue boxes, used retrogene regions in a new chimerical gene; green box, newly recruited region in a new chimerical gene. (B) Example of a retrogene (AK111451_Chr04_3431458_3437611) that has the three signatures of retroposition (i.e., loss of introns, poly(A) tract, and flanking direct repeats).
Figure 2.
Figure 2.
Identification of Retroposition in the Rice Genome. (A) The flow chart of the search scheme for identification of potentially functional retroposed genes and processed pseudogenes. We mapped KOME cDNAs to the finished 93-11 genome sequences and got the transcript unit. To get reliable coding genes, we filtered genes with <300 bp CDS. For the identification of retrogenes, we did the following analysis: (1) six-frame TBLASTN searches for homologs of the KOME cDNAs in the 93-11 genome; (2) realignment of KOME proteins to genomic sequence by GeneWise to get the intron-exon structure of every homolog; (3) filtered homologs with overlapped genomic position and defined all gene duplications; (4) identified retrogenes from duplicate genes; (5) based on the existence of stop codon or frame shift, we defined retropseudogenes; and (6) other retrogenes were defined as intact retrogenes. (B) Proportion of different categories of genes in the identified 14,790 duplication events. In total, there are 1235 primary retrogenes (RGs), each of which derived from a single retroposition event, and the other homologs are assumed to have been generated by regular DNA-based gene duplication. For retrogenes, if a premature stop codon or frame shift occurs within the CDS region, they are defined as retropseudogenes; others are called intact retrogenes.
Figure 3.
Figure 3.
Ka:Ks Distributions of Retropseudogenes and Intact Retrogenes with and without Full-Length cDNAs. The Ka:Ks ratio is obtained between the retrogenes and its parental sequences. Bin size is 0.03.
Figure 4.
Figure 4.
Examples of RT-PCR Results for Nine Chimerical Retrogenes. M, DNA ladder (DL2000; Invitrogen); 0+, actin gene used as positive control; C, RT-PCR negative control in which the only difference from the positive lanes is that reverse transcriptase was not added in the RNA template. Lanes 1 to 9, RT-PCR–amplified fragments of the following nine chimerical retrogenes: AK072907_Chr03_25572510_25577541, AK064442_Chr06_25446009_25451022, AK070283_Chr04_33398346_33404211, AK072552_Chr07_27330570_27335745, AK073972_Chr04_3993417_3998040, AK064488_Chr04_19699106_19703477, AK059235_Chr07_11072250_11076879, AK064415_Chr07_16520248_16524877, and AK064641_Chr04_4027771_4032364. The RNAs for the amplification were the mixture of the equal amount of RNAs extracted from roots, shoots, leaves, and flowers (see Methods).
Figure 5.
Figure 5.
Examples of Four Chimerical Genes That Have Chimerical Protein Structures. Orange boxes, parental and retrogene CDS regions; blue dashed boxes, used retrogene regions in new chimerical genes; green boxes, newly recruited regions in new chimerical genes; and red dashed boxes, unaligned CDS regions.
Figure 6.
Figure 6.
Ks Distribution of Chimerical Retrogenes. Bin size is 0.07. RGs, retrogenes.
Figure 7.
Figure 7.
Positional Effect of Retropositions between Chromosomes. The white points in the chromosomes represent centromeres. Retropositions from other chromosomes into each chromosome are shown by colored lines, which suggest the directions from parental copies to the inserted sites of retrogenes in a particular chromosome. Blue lines suggest intact retrogenes, and red lines suggest retropseudogenes. Blue lines are rarely directed to centromeric regions, while red line are randomly directed (P = 0.00037).

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
    1. Arabidopsis Genome Initiative (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408 796–815. - PubMed
    1. Arhondakis, S., Auletta, F., Torelli, G., and D'Onofrio, G. (2004). Base composition and expression level of human genes. Gene 325 165–169. - PubMed
    1. Barbazuk, W.B., Bedell, J.A., and Rabinowicz, P.D. (2005). Reduced representation sequencing: A success in maize and a promise for other plant genomes. Bioessays 27 839–848. - PubMed
    1. Bedell, J.A., et al. (2005). Sorghum genome sequencing by methylation filtration. PLoS Biol. 3 e13. - PMC - PubMed

Publication types