Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 13;14(3):R22.
doi: 10.1186/gb-2013-14-3-r22.

Retrotransposition of gene transcripts leads to structural variation in mammalian genomes

Retrotransposition of gene transcripts leads to structural variation in mammalian genomes

Adam D Ewing et al. Genome Biol. .

Abstract

Background: Retroposed processed gene transcripts are an important source of material for new gene formation on evolutionary timescales. Most prior work on gene retrocopy discovery compared copies in reference genome assemblies to their source genes. Here, we explore gene retrocopy insertion polymorphisms (GRIPs) that are present in the germlines of individual humans, mice, and chimpanzees, and we identify novel gene retrocopy insertions in cancerous somatic tissues that are absent from patient-matched non-cancer genomes.

Results: Through analysis of whole-genome sequence data, we found evidence for 48 GRIPs in the genomes of one or more humans sequenced as part of the 1,000 Genomes Project and The Cancer Genome Atlas, but which were not in the human reference assembly. Similarly, we found evidence for 755 GRIPs at distinct locations in one or more of 17 inbred mouse strains but which were not in the mouse reference assembly, and 19 GRIPs across a cohort of 10 chimpanzee genomes, which were not in the chimpanzee reference genome assembly. Many of these insertions are new members of existing gene families whose source genes are highly and widely expressed, and the majority have detectable hallmarks of processed gene retrocopy formation. We estimate the rate of novel gene retrocopy insertions in humans and chimps at roughly one new gene retrocopy insertion for every 6,000 individuals.

Conclusions: We find that gene retrocopy polymorphisms are a widespread phenomenon, present a multi-species analysis of these events, and provide a method for their ascertainment.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic overview of our method for detecting non-reference gene retrocopy insertions from paired read mappings. Read pairs are represented by two boxes for the sequenced portion of the paired read, joined by a line representing the unsequenced region (not to scale). Reads aligning to exonic sequences are colored red, and boxes aligning to non-exonic sequences are colored blue. For genomic intervals with no significant structural changes relative to the reference, reads will map normally as depicted in the upper panel. Note the forward-reverse orientation pattern of the read pair mappings as indicated under the sequenced ends. Non-reference gene retrocopy insertions (bottom panel) are represented by a series of discordant read mappings in a common interval (blue boxes) where one end of each read matches a distal exon on a common gene annotation (red boxes). The minimum interval between the left and right groups of blue boxes defines the start and end coordinates used in Additional file 1: Tables S2, S4-6, and S9. For Illumina paired reads, the forward-reverse sequencing scheme means that the non-exonic end of paired reads spanning the 5' junction is mapped in the forward orientation and the non-exonic read of the pair spanning the 3' junction is mapped in the reverse orientation (see arrows). Thus, the regions joined by oriented paired reads between reference chrB and the gene on reference chrA form a path that indicates a gene retrocopy insertion on the chrB allele in the individual genome from which the paired reads were derived. As depicted on the non-reference version of chrB, processed gene retrocopies lack introns, and the resulting exon-exon junctions are detectable by local assembly.
Figure 2
Figure 2
Locations of 48 non-reference gene retrocopy insertion sites in the human genome based on reads mapped to source genes. Discordant read mappings are represented by links colored based on the chromosome of the source gene. Insertion sites are represented by black circles, and the gene labels are based on the position of the source genes.
Figure 3
Figure 3
Gene retrocopy insertion annotations. (a) Functional classification of retrocopy source genes based on gene ontology and manual curation. The genes associated with each functional classification can be found in Table S8 in Additional file 1. (b) Number of annotated processed pseudogenes in the human genome reference assembly (GRCh37) (y-axis) for each source gene associated with a gene retrocopy in this study (x-axis). Processed pseudogene annotations were derived from pseudogene.org human build 65 [1,3].
Figure 4
Figure 4
Gene retrocopy insertions in mice. (a) Number of gene retrocopies absent from the C57BL/6J reference (y-axis) present in each of 17 inbred mouse strains [47] (x-axis). (b) Heatmap created by the heatmap.2 function in the gplots package in R based on the Jaccard distance from pairwise comparison of GRIP alleles between strains (Materials and methods). C57BL/6NJ was left out of the inter-strain comparison of non-reference GRIPs because all but one insertion was shared with the C57BL/6J reference. As indicated on the histogram to the left of the heatmap, distances range from 0 (white, GRIP profile) to 1 (dark blue, no overlap in GRIP profiles). Hierarchical clustering of similarity indices generally recapitulates the breeding history of wild and inbred mouse strains [50].
Figure 5
Figure 5
Population distribution of human gene retrocopy insertions. Rows represent self-described human populations with three-letter designations as used by the 1,000 Genomes Project. Columns represent 48 retrocopies. Open squares indicate the GRIP (row) was not detected in the population (column) and filled squares indicate that a GRIP was detected in the corresponding population at a frequency indicated by the color of the square. Hierarchical clustering of populations was performed using the Jaccard distance between each pair of insertion profiles. Population-specific GRIPs restricted to either single populations or groups with geographically similar ancestry are shown according to geographic locality. Correspondence between indicated geographic locations and columns representing allele frequencies is indicated by open, closed, or partially closed circles. ASW: African ancestry in south-west US; CEU: Utah residents with northern and western European ancestry; CHB: Han Chinese in Beijing, China; CHS: Han Chinese in southern China; CLM: Colombian in Medellin, Colombia; FIN: Finnish from Finland; GBR: British from England and Scotland; GRIP: gene retrocopy insertion polymorphism; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MXL: Mexican ancestry in Los Angeles; PUR: Puerto Rican in Puerto Rico; TSI: Toscani in Italy; YRI: Yoruba in Ibadan, Nigeria

Similar articles

Cited by

References

    1. Zhang Z, Harrison PM, Liu Y, Gerstein M. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Research. 2003;13:2541–58. doi: 10.1101/gr.1429003. - DOI - PMC - PubMed
    1. Zhang Z, Carriero N, Gerstein M. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends in Genetics. 2004;20:62–7. doi: 10.1016/j.tig.2003.12.005. - DOI - PubMed
    1. Karro JE, Yan Y, Zheng D, Zhang Z, Carriero N, Cayting P, Harrrison P, Gerstein M. Pseudogene.org: comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Research. 2007;35:D55–60. doi: 10.1093/nar/gkl851. - DOI - PMC - PubMed
    1. Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu X, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB. The GENCODE pseudogene resource. Genome Biology. 2012;13:R51. doi: 10.1186/gb-2012-13-9-r51. - DOI - PMC - PubMed
    1. Vanin EF. Processed pseudogenes: characteristics and evolution. Annual Review of Genetics. 1985;19:253–72. doi: 10.1146/annurev.ge.19.120185.001345. - DOI - PubMed

Publication types