Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr;12(4):347-50.
doi: 10.1038/nmeth.3314. Epub 2015 Mar 2.

Genome sequence-independent identification of RNA editing sites

Affiliations

Genome sequence-independent identification of RNA editing sites

Qing Zhang et al. Nat Methods. 2015 Apr.

Abstract

RNA editing generates post-transcriptional sequence changes that can be deduced from RNA-seq data, but detection typically requires matched genomic sequence or multiple related expression data sets. We developed the GIREMI tool (genome-independent identification of RNA editing by mutual information; https://www.ibp.ucla.edu/research/xiao/GIREMI.html) to predict adenosine-to-inosine editing accurately and sensitively from a single RNA-seq data set of modest sequencing depth. Using GIREMI on existing data, we observed tissue-specific and evolutionary patterns in editing sites in the human population.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. The GIREMI method
(a) RNA-Seq reads harboring multiple SNPs and/or RNA editing sites. The allelic combinations of two SNPs in the same reads are the same as their DNA haplotypes. In contrast, a SNP and an RNA editing site (or a pair of RNA editing sites) exhibit variable allelic linkage. (b) Distributions of MI associated with SNPs and RNA editing sites, respectively, estimated using GM12878 RNA-Seq data (ENCODE, cytosolic, polyA+) and its associated genome sequencing data. Our previous genome-dependent method was applied to identify RNA editing sites. (c) Predicted RNA editing sites by GIREMI in the GM12878 data. Different fractions of genomic SNPs of GM12878 were assumed as unknown by excluding them from dbSNP. For each fraction, the SNPs were selected randomly and the procedure was repeated 9 times. Results shown here are averages of the 9 randomized trials. Gray bars: percentage of GM12878 SNPs among all single-nucleotide mismatches in the mapped RNA-Seq reads after filtering for artifacts (Online Methods). Orange bars: percentage of false positives (GM12878 SNPs) among all predicted editing sites (i.e., FDR). The number of predicted editing sites and % A-to-G editing are shown in orange. (d) Performance of GIREMI at different sequencing depth (down-sampled GM12878 data). Number of mapped reads (singletons) is shown along the x-axis. Fifty percent of the GM12878 SNPs were assumed to be unknown. Labels are similar as in (c).
Fig. 2
Fig. 2. RNA editomes of human tissues and individuals
(a) Comparison of RNA editing sites across human tissues. Hierarchical clustering of Pearson correlation coefficients is shown (calculated for editing ratios of all editing sites that are present in 35 samples). Samples are labeled by the rows with indicated color codes for individuals and tissues, respectively. Different brain regions are represented in the same color given their highly similar editing profiles. (b) Conservation of the immediate neighborhood of tissue specific editing (TSE) sites in 3’ UTRs. Sequence conservation (percentage of sequence identity in primates) of each position flanking editing sites (position 0) is shown. Shaded regions represent 95% confidence interval. A similar plot for non-TSE sites is included for comparison purpose. (c) Distribution of editing sites of 93 human individuals in different types of intragenic regions. Editing sites were grouped according to their prevalence values in this population. “Noncoding” refers to noncoding genes or noncoding transcripts of coding genes. Regional distribution of nucleotides in the entire transcriptome of coding genes (without introns) is shown as a reference (rightmost bar labeled as T). (d) Conservation of 3’ UTR regions flanking two groups of editing sites with different prevalence levels (solid lines), similar as in (b). Dashed lines correspond to the sequence identity if Gs in other genomes were assumed as a conserved base given a reference nucleotide A in human.

Similar articles

Cited by

References

    1. Bass BL. Annu Rev Biochem. 2002;71:817–846. - PMC - PubMed
    1. Nishikura K. Annu Rev Biochem. 2010;79:321–349. - PMC - PubMed
    1. Farajollahi S, Maas S. Trends Genet. 2010;26:221–230. - PMC - PubMed
    1. Lee JH, Ang JK, Xiao X. RNA. 2013;19:725–732. - PMC - PubMed
    1. Enstero M, Daniel C, Wahlstedt H, Major F, Ohman M. Nucleic Acids Res. 2009;37:6916–6926. - PMC - PubMed

Publication types