Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 20;6(5):e1000954.
doi: 10.1371/journal.pgen.1000954.

A survey of genomic traces reveals a common sequencing error, RNA editing, and DNA editing

Affiliations

A survey of genomic traces reveals a common sequencing error, RNA editing, and DNA editing

Alexander Wait Zaranek et al. PLoS Genet. .

Abstract

While it is widely held that an organism's genomic information should remain constant, several protein families are known to modify it. Members of the AID/APOBEC protein family can deaminate DNA. Similarly, members of the ADAR family can deaminate RNA. Characterizing the scope of these events is challenging. Here we use large genomic data sets, such as the two billion sequences in the NCBI Trace Archive, to look for clusters of mismatches of the same type, which are a hallmark of editing events caused by APOBEC3 and ADAR. We align 603,249,815 traces from the NCBI trace archive to their reference genomes. In clusters of mismatches of increasing size, at least one systematic sequencing error dominates the results (G-to-A). It is still present in mismatches with 99% accuracy and only vanishes in mismatches at 99.99% accuracy or higher. The error appears to have entered into about 1% of the HapMap, possibly affecting other users that rely on this resource. Further investigation, using stringent quality thresholds, uncovers thousands of mismatch clusters with no apparent defects in their chromatograms. These traces provide the first reported candidates of endogenous DNA editing in human, further elucidating RNA editing in human and mouse and also revealing, for the first time, extensive RNA editing in Xenopus tropicalis. We show that the NCBI Trace Archive provides a valuable resource for the investigation of the phenomena of DNA and RNA editing, as well as setting the stage for a comprehensive mapping of editing events in large-scale genomic datasets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Evidence for editing events emerges by enrichment for clusters of mismatches.
(A) Human traces are mined for clusters of mismatches of the same type. Shown is the percent frequency of clusters by type. The G-to-A mismatch type becomes more dominant with increasing numbers of mismatches (as does T-to-G). (B) Runs of five (or more) mismatches by type and sequencing center with an identical 3bp motif centered on each mismatch. Data from eight sequencing centers is shown. All of these centers had at least 1000 examples that meet the above criteria. (C) Clusters with three (or more) mismatches with at least two very high quality mismatches (Phred 40). A mismatch spectrum consistent with editing can be observed.
Figure 2
Figure 2. G-to-A sequencing artifact.
(A) A chromatogram, from a trace matching the criteria in Figure 1B. An AAA motif is centered at position 244 and corresponds with position 90 in the control; another AAA motif occurs at position 253 which corresponds to position 99 in the control. It can be seen that each peak in this chromatogram is preceded by a smaller, identical sub-peak. This has the effect of making it likely that a normally small peak (see control) will be overwhelmed by the sub-peak of the adjacent, normally tall peak (see control). (B) A chromatogram from a control trace that matches the reference—position 90 is the center of an AGA motif.
Figure 3
Figure 3. DNA editing in human HERVL-A1.
Trace 1735626615 aligns uniquely to chromosome 2 where the known retrotransposon HERVL-A1 is located (chr2: 100697697–100700125). A cluster of 15 G-to-A mismatches (worst mismatch phred 35; best mismatch phred 49) suggests that the trace originates from an edited version of the element. Support for the APOBEC source of the editing comes from the preferred GG-to-AG motif (11 out of the 15 cases) and GA-to-AA (remaining 4 cases) which is the dinucleotide context (in the same order) in an HIV hypermutated genome, and is the sequence motif of APOBEC3G and APOBEC3F .
Figure 4
Figure 4. DNA editing in human AluY.
Example of possible DNA editing in human chr21:40977741–40978045. Alignment of trace 1745107496 to the human reference genome lead to large number of G-to-A mismatches which are indications for possible DNA editing in this retrotransposon. All the mismatches are located in high quality sequence positions, reducing the possibility of sequence errors.
Figure 5
Figure 5. Evidence for RNA editing in the cDNA traces.
(A) While no over-representation of the RNA derived mismatches (A-to-G and its complimentary T-to-C) clusters are observed in the full set of RNA traces in human (n = 238,370) and Xenopus tropicalis (n = 444,526), (B) significant over-representation of RNA editing type is observed in high quality cDNA sequencing set of human (n = 769; p-value 1.5e-119; Fisher's Exact Test.) and Xenopus (n = 2,847; p-value≪e-200). (C) No such over-representation was observed in the set of high quality DNA traces (human: n = 64,191; Xenopus: n = 3,471). These observations support that RNA editing is the cause of the mismatches in the sets of higher quality cDNA.
Figure 6
Figure 6. ADAR signature in the cDNA edited traces.
Significant under-representation of “G” immediately upstream to the editing sites which is in agreement with the known sequence motif of the ADAR proteins.
Figure 7
Figure 7. RNA editing in Xenopus tropicalis.
(A) Evidence for RNA editing can be seen in this locus as multiple traces of RNA origin align to it with numerous A-to-G mismatches. The trace accession numbers and their coordinates are given in the multiple alignment. (B) Predicted RNA structure of the genomic locus indicates a long and stable dsRNA structure which is a favorite target for editing by ADARs. Each editing site from the multiple alignment is marked by an arrow. The length of the arrow corresponds to the editing level.

Similar articles

Cited by

References

    1. Bass BL. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem. 2002;71:817–846. - PMC - PubMed
    1. Hurst SR, Hough RF, Aruscavage PJ, Bass BL. Deamination of mammalian glutamate receptor RNA by Xenopus dsRNA adenosine deaminase: similarities to in vivo RNA editing. Rna. 1995;1:1051–1060. - PMC - PubMed
    1. Kim U, Wang Y, Sanford T, Zeng Y, Nishikura K. Molecular cloning of cDNA for double-stranded RNA adenosine deaminase, a candidate enzyme for nuclear RNA editing. Proc Natl Acad Sci U S A. 1994;91:11457–11461. - PMC - PubMed
    1. Melcher T, Maas S, Herb A, Sprengel R, Seeburg PH, et al. A mammalian RNA editing enzyme. Nature. 1996;379:460–464. - PubMed
    1. O'Connell MA, Krause S, Higuchi M, Hsuan JJ, Totty NF, et al. Cloning of cDNAs encoding mammalian double-stranded RNA-specific adenosine deaminase. Mol Cell Biol. 1995;15:1389–1397. - PMC - PubMed

Publication types