A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes

doi:10.1101/gr.164749.113

. 2014 Mar;24(3):365-76.

doi: 10.1101/gr.164749.113. Epub 2013 Dec 17.

A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes

Lily Bazak¹, Ami Haviv, Michal Barak, Jasmine Jacob-Hirsch, Patricia Deng, Rui Zhang, Farren J Isaacs, Gideon Rechavi, Jin Billy Li, Eli Eisenberg, Erez Y Levanon

Affiliations

PMID: 24347612
PMCID: PMC3941102
DOI: 10.1101/gr.164749.113

A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes

Lily Bazak et al. Genome Res. 2014 Mar.

. 2014 Mar;24(3):365-76.

doi: 10.1101/gr.164749.113. Epub 2013 Dec 17.

Authors

Lily Bazak¹, Ami Haviv, Michal Barak, Jasmine Jacob-Hirsch, Patricia Deng, Rui Zhang, Farren J Isaacs, Gideon Rechavi, Jin Billy Li, Eli Eisenberg, Erez Y Levanon

Affiliation

¹ Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel;

PMID: 24347612
PMCID: PMC3941102
DOI: 10.1101/gr.164749.113

Abstract

RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.

PubMed Disclaimer

Figures

**Figure 1.**
Detection of A-to-I editing in *Alu* repeats. (A) Multiple alignment of reads to the reference genome reveals sites of A-to-I editing (red), as well as genomic polymorphisms and sequencing errors (yellow). Detection sensitivity is improved upon examining clusters of mismatches rather than looking at each site independently. Yet, at low coverage, many bona fide editing sites either do not show any AG mismatch, or show a weak signal indistinguishable from sequencing errors. The sites detected include the few strongly edited sites and a random sample of the weaker sites. (B) Ultradeep coverage enables the full scope of editing to be revealed, showing all sites that support editing, typically at very low levels (<1%).

**Figure 2.**
Mismatch distributions along the detection pipeline. (A) Even a simple count of all mismatches in high-quality base pairs of sequencing reads data of *Alu* repeats shows a significant enrichment of editing-derived mismatch types (AG and TC). (B) Applying a strict statistical model to filter out probable sequencing errors further increases the fraction of AG/TC mismatches, but results in the loss of most of the estimated true editing signal as well. (C) In this study, we focused on the full *Alu* repeats rather than single genomic sites. This improves the statistical power, with only a minor reduction in the signal. As a result, we found that virtually all *Alu* repeats are dominated by AG/TC mismatches. (*D–F*) The same pipeline applied to mismatches located in the common L1 retroelement. Clearly, the strong propensity for A-to-I RNA editing is unique to the *Alu* repeat. However, some enrichment of AG/TC mismatches is nevertheless observed, attesting to some editing activity in the L1 repeats.

**Figure 3.**
Distribution of downstream (A) and upstream (B) nucleotides for editing sites detected in the HBM data sets. Edited sites are split into three groups according to their editing level: low level ≤10%, high level ≥40%, and medium level >10% and <40%. A clear signature of the ADAR sequence preference is observed (low G upstream of the site, and some enrichment downstream from the site). The preference is stronger at sites with high editing levels.

**Figure 4.**
Distribution of editing events along the consensus for the eight most edited *Alu* subfamilies (UCSC Genome Browser annotation). The number of edited *Alu* repeats of each family is given. Clearly, there are hotspots for editing in each of the families.

**Figure 5.**
Average editing levels per tissue in HBM data. For each tissue, the total mismatches (before filtering) are grouped for each of the four bases and presented according to the mismatch type. Although in A (T) positions, only one type of mismatch is dominant (G or C, accordingly), at C and G the picture is very different, exhibiting a lower number of mismatches (note the different scale) with a more even distribution. (A) A reference positions with non-A reads, per tissue. (B) T reference positions with non-T reads, per tissue. (C) C reference positions with non-C reads, per tissue. (D) G reference positions with non-G reads, per tissue.

**Figure 6.**
Editing detection is sensitive to sequencing coverage. (A) The average number of adenosines in an *Alu* repeat showing evidence for editing increases with the available coverage (number of reads supporting the examined nucleotide), with no sign of saturation (HBM data). A number of mismatch sites of types other than AG/TC saturate at a relatively low coverage (after applying the statistical model to filter sequencing errors). As the typical coverage in RNA-seq is much lower than 1000 reads, this suggests that previous counts of editing sites are grossly underestimated. (B) Fraction of *Alu* repeats showing evidence of editing (i.e., dominated by AG/TC mismatches). Again, strong dependence on coverage is observed, and atypically high coverage is required for detection in most of the *Alu* repeats. Our ultradeep MiSeq experiment reached saturation with all *Alu* repeats detected at a coverage of 1000 reads (coverage is defined as the median read coverage for the adenosines and thymines in the given *Alu* repeat). Based on these calculations, we estimate the total number of A-to-I editing sites in the human genome to exceed 100 million sites. (C) Number of different transcript variants per *Alu*, as a function of the reads' coverage. No saturation is observed even for ultrahigh coverage.

**Figure 7.**
Mismatch fraction distribution. Even before applying any statistical filters or analysis, a marked distinction is evident between AG/TC mismatches and other types of mismatches, provided there is sufficiently deep coverage. Presented are the distributions of the mismatch fractions (percent of reads that exhibit the mismatch among all reads supporting the site) for all (high quality, Q ≥ 30) mismatches seen in our MiSeq experiment at sites with high coverage (≥5000 reads, allowing for an accurate assessment of the mismatch fraction). Most mismatches are likely to result from sequencing errors and occur at fractions <0.1%, consistent with the sequencing quality. The AG/TC mismatches span a different range of mismatch fractions, where the bulk of the distribution lies in the range 0.1%–1%, but some sites are edited with stronger efficiencies, up to those showing close to 100% editing in a few sites. This separation of scales allows identification of editing sites, provided an accurate assessment of the mismatch fraction (requiring ultradeep coverage) is available. MM, mismatch. The y-axis shows the normalized probability density P(−log[MM fraction]).

**Figure 8.**
Mismatch distribution along the reads. (A) AG/TC sites are evenly distributed along the reads and are even slightly depleted toward the read ends, as the alignments are more sensitive to mismatches in this region. (B) Other types of mismatches (GA/CT) show a pronounced increase toward the read ends, suggesting many of these mismatches, albeit trimming, could be attributed to alignment artifacts. Reads are 75 bp long.

See this image and copyright information in PMC

Cited by

The Regulation of RNA Modification Systems: The Next Frontier in Epitranscriptomics?
Schaefer MR. Schaefer MR. Genes (Basel). 2021 Feb 26;12(3):345. doi: 10.3390/genes12030345. Genes (Basel). 2021. PMID: 33652758 Free PMC article. Review.
Detecting haplotype-specific transcript variation in long reads with FLAIR2.
Tang AD, Felton C, Hrabeta-Robinson E, Volden R, Vollmers C, Brooks AN. Tang AD, et al. Genome Biol. 2024 Jul 2;25(1):173. doi: 10.1186/s13059-024-03301-y. Genome Biol. 2024. PMID: 38956576 Free PMC article.
MiRNA post-transcriptional modification dynamics in T cell activation.
Rodríguez-Galán A, Dosil SG, Gómez MJ, Fernández-Delgado I, Fernández-Messina L, Sánchez-Cabo F, Sánchez-Madrid F. Rodríguez-Galán A, et al. iScience. 2021 May 12;24(6):102530. doi: 10.1016/j.isci.2021.102530. eCollection 2021 Jun 25. iScience. 2021. PMID: 34142042 Free PMC article.
Recent Advances in Adenosine-to-Inosine RNA Editing in Cancer.
Gan WL, Ng L, Ng BYL, Chen L. Gan WL, et al. Cancer Treat Res. 2023;190:143-179. doi: 10.1007/978-3-031-45654-1_5. Cancer Treat Res. 2023. PMID: 38113001 Review.
RNA epitranscriptomics dysregulation: A major determinant for significantly increased risk of ASD pathogenesis.
Beopoulos A, Géa M, Fasano A, Iris F. Beopoulos A, et al. Front Neurosci. 2023 Feb 16;17:1101422. doi: 10.3389/fnins.2023.1101422. eCollection 2023. Front Neurosci. 2023. PMID: 36875672 Free PMC article.

See all "Cited by" articles

References

1. Athanasiadis A, Rich A, Maas S 2004. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391. - PMC - PubMed
1. Bahn JH, Lee J-H, Li G, Greer C, Peng G, Xiao X 2012. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 22: 142–150 - PMC - PubMed
1. Barak M 2009. Evidence for large diversity in the human transcriptome created by Alu RNA editing. Nucleic Acids Res 37: 6905–6915 - PMC - PubMed
1. Bass BL 2002. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71: 817–846 - PMC - PubMed
1. Bass BL, Weintraub H 1988. An unwinding activity that covalently modifies its double-stranded RNA substrate. Cell 55: 1089–1098 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Athanasiadis A, Rich A, Maas S 2004. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391. - PMC - PubMed

[2] Athanasiadis A, Rich A, Maas S 2004. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391. - PMC - PubMed

[3] Bahn JH, Lee J-H, Li G, Greer C, Peng G, Xiao X 2012. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 22: 142–150 - PMC - PubMed

[4] Bahn JH, Lee J-H, Li G, Greer C, Peng G, Xiao X 2012. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 22: 142–150 - PMC - PubMed

[5] Barak M 2009. Evidence for large diversity in the human transcriptome created by Alu RNA editing. Nucleic Acids Res 37: 6905–6915 - PMC - PubMed

[6] Barak M 2009. Evidence for large diversity in the human transcriptome created by Alu RNA editing. Nucleic Acids Res 37: 6905–6915 - PMC - PubMed

[7] Bass BL 2002. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71: 817–846 - PMC - PubMed

[8] Bass BL 2002. RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71: 817–846 - PMC - PubMed

[9] Bass BL, Weintraub H 1988. An unwinding activity that covalently modifies its double-stranded RNA substrate. Cell 55: 1089–1098 - PubMed

[10] Bass BL, Weintraub H 1988. An unwinding activity that covalently modifies its double-stranded RNA substrate. Cell 55: 1089–1098 - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes

Affiliation

A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials