Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 30;6(9):e1001141.
doi: 10.1371/journal.pgen.1001141.

Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis

Affiliations

Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis

Qi Zheng et al. PLoS Genet. .

Abstract

The functional structure of all biologically active molecules is dependent on intra- and inter-molecular interactions. This is especially evident for RNA molecules whose functionality, maturation, and regulation require formation of correct secondary structure through encoded base-pairing interactions. Unfortunately, intra- and inter-molecular base-pairing information is lacking for most RNAs. Here, we marry classical nuclease-based structure mapping techniques with high-throughput sequencing technology to interrogate all base-paired RNA in Arabidopsis thaliana and identify ∼200 new small (sm)RNA-producing substrates of RNA-DEPENDENT RNA POLYMERASE6. Our comprehensive analysis of paired RNAs reveals conserved functionality within introns and both 5' and 3' untranslated regions (UTRs) of mRNAs, as well as a novel population of functional RNAs, many of which are the precursors of smRNAs. Finally, we identify intra-molecular base-pairing interactions to produce a genome-wide collection of RNA secondary structure models. Although our methodology reveals the pairing status of RNA molecules in the absence of cellular proteins, previous studies have demonstrated that structural information obtained for RNAs in solution accurately reflects their structure in ribonucleoprotein complexes. Furthermore, our identification of RNA-DEPENDENT RNA POLYMERASE6 substrates and conserved functional RNA domains within introns and both 5' and 3' untranslated regions (UTRs) of mRNAs using this approach strongly suggests that RNA molecules are correctly folded into their secondary structure in solution. Overall, our findings highlight the importance of base-paired RNAs in eukaryotes and present an approach that should be widely applicable for the analysis of this key structural feature of RNA.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The dsRNA component of the Arabidopsis transcriptome.
(A) Classification of genome-matching dsRNA-seq reads. (B) The heatmap indicates the strand bias of dsRNA-seq reads with respect to specific classes of RNA molecules. The color intensities indicate the degree of strand bias as specified by a log-odds ratio (Lods-ratio) value of sense/anti-sense mapping reads (red, sense; green, antisense; yellow, unbiased). TE, transposable element. (C) Model of secondary structure for an Arabidopsis tRNA (At1g16100) predicted using X-ray crystallography structure information . Colored lines surrounding the model indicate the dsRNA-seq read counts that are normalized by the length of sequenced bases for each tRNA nucleotide (see scale bar for corresponding values). Black arrows specify the anti-codon loop and amino acid acceptor stem of the tRNA. (D) An intermolecular base-paired RNA molecule, At2g24700 (TAS1a), identified by dsRNA-seq. Screenshot from http://tesla.pcbi.upenn.edu/annoj_at9/. W (green bars) and C (red bars) indicate sequence reads from Watson and Crick strands, respectively.
Figure 2
Figure 2. Identification of Arabidopsis RDR6 smRNA–producing substrates genome-wide.
(A) Distribution of wild-type Col-0 compared to rdr6 mutant 1 kb dsRNA-seq differentially expressed regions along the length of Chromosome (Chr.) 1. Each colored dot denotes a specific 1 kb region (≥2-fold and p<.001). Colored dots with positive Lods-ratio values are 1 kb regions where Col-0> rdr6, while negative values denote Col-0< rdr6. The corresponding RNA category for each colored dot can be found in the color legend box. The dark blue diamond denotes known RDR6 substrate, TAS1b. (B) Classification of all 1 kb regions where Col-0> rdr6 (green bars) and Col-0< rdr6 (yellow bars). (C) Distribution of 1 kb regions along Chr. 1 where Col-0> rdr6 in both dsRNA- and smRNA-seq datasets (≥2-fold and p<.001). Values above black line denote Lods-ratio for dsRNA-seq regions, and values below black line denote results for smRNAs. Blue and green diamonds highlight known RDR6 substrates, while the red diamond denotes the newly identified At1g20370. (D) Classification of all smRNA-producing substrates of Arabidopsis RDR6 identified using the combination of dsRNA- and smRNA-seq. (E) The 10 most significantly enriched biological processes (and corresponding p-values) for protein-coding mRNAs that are RDR6 smRNA-producing substrates. (F) The total number of smRNAs corresponding to each indicated size class (19–26) produced from the 218 identified RDR6 substrates.
Figure 3
Figure 3. Novel smRNA–producing substrates of RDR6.
(A–D) Four examples of RDR6 smRNA-generating substrates identified using the combination of dsRNA- and smRNA-seq (screenshots from http://tesla.pcbi.upenn.edu/annoj_at9/). W (green bars) and C (red bars) indicate sequence reads from Watson and Crick strands, respectively. (A) At5g39370 (previously identified), (B) At1g20370 (novel), (C) the intergenic region just upstream of At2g41490 (novel), and (D) At3g19890 (novel). (E) Random-primed RT-qPCR analysis of four previously identified and 10 novel RDR6 substrates for wild-type Col-0 and rdr6-11 mutant plants. Error bars, ±SD. ** indicates p-value <.001. Green and red lines underline previously identified and novel RDR6 substrates, respectively. * denotes RDR6 substrates that produce phased siRNAs.
Figure 4
Figure 4. Highly base-paired segments of the Arabidopsis genome (dsRNA “hotspots”).
(A) Approximate genomic distribution (∼100 kb resolution) and length of dsRNA “hotspots” along Arabidopsis Chr. 1 for wild-type Col-0. (B) Classification of dsRNA “hotspots.” TE, transposable element. (C) The 18 most significantly enriched molecular functions for protein-coding mRNAs that contain dsRNA ‘hotspots’. Red labels indicate nucleic acid biology GO categories. (D) The percent of nucleotides within dsRNA ‘hotspots’ that were found to produce smRNAs. The smRNA data used for this analysis is described in Figure S8.
Figure 5
Figure 5. Identification of widespread, conserved functionality within non-coding portions of mRNA (introns, 3′ and 5′ UTRs), intergenic regions, and transposons.
(A, B) The average conservation scores (consScore) calculated using a seven-way comparative genomics analysis of dsRNA ‘hotspots’ (green bars) or their flanking regions (yellow bars) in specific portions (coding (exons), 5′ UTR, 3′ UTR, and introns) of pre-mRNAs (A), as well as intergenic regions and transposons (TE) (B). (C, D) Models of secondary structure for Arabidopsis (E) At1g67430 (nt 25262487–25262809) and (F) At2g40650 (nt 16964129–16964413) intronic functional moieties determined by dsRNA-seq constrained parameters for RNAfold (see below) (screenshots from the structural viewer at http://tesla.pcbi.upenn.edu/annoj_at9/). The scale bar to the left of each model indicates the read counts that are normalized by the length of sequenced bases for the transcript. The multiple alignments for these conserved, intronic dsRNA ‘hotspots’ can be seen in Figure S5A and S5B. G denotes the Gibb's free energy value (kilocalories/mole) for the corresponding RNA secondary structure model.
Figure 6
Figure 6. Identification of novel, highly structured RNAs using dsRNA–seq.
(A–D) Four examples of intergenic, highly base-paired transcripts (screenshots from http://tesla.pcbi.upenn.edu/annoj_at9). W (red bars) and C (green bars) indicate signal from Watson and Crick strands, respectively. (A) Two intergenic dsRNA ‘hotspots’ (h348 and h349) found between At2g06555 and At2g06560. (B) A novel, base-paired RNA on Chr. 4 between At4g03360 and At4g03370. (C) A Chr. M intergenic dsRNA ‘hotspot’ between AtMg00160 and AtMg00170. (D) An example of a new, highly structured RNA from Chr. M that lies between AtMg01330 and AtMg01340. (E–I) Random-primed RT-PCR analysis of the novel, base-paired RNAs that are pictured in (A–D) using five different Arabidopsis tissues (leaf blade, leaf petiole, cauline leaves, stem, and unopened flower bud clusters). (E, F) correspond to h348 and h349 in (A), respectively. (G–I) correspond to (B–D), respectively. Flower bud RNA samples that were not treated with reverse transcriptase serve as controls for this experiment. (J) The percent of total new transcripts for each indicated category that do (blue bars) or do not (red bars) overlap with smRNA ‘hotspots’. There are 1,602, 897, and 705 corresponding transcription units for the All, unannotated repeats/TEs, and completely unannotated categories, respectively. TE, transposable element. (K) The number of smRNAs corresponding to each indicated size class (19–26) produced from the unannotated repeats/TEs. (L) The number smRNAs corresponding to each indicated size class (19–26) produced from the completely unannotated transcription units.
Figure 7
Figure 7. A sequencing-based approach to interrogate mRNA secondary structure genome-wide.
(A) Classification of genome-matching dsRNA-seq reads after two rounds of rRNA-depletions (2X Ribominus approach). (B) The heatmap indicates the strand bias of 2X Ribominus dsRNA-seq reads with respect to specific classes of RNA molecules. The color intensities indicate the degree of strand bias as specified by a normalized Lods-ratio value of sense/anti-sense mapping reads (red, sense; green, antisense; yellow, unbiased). TE, transposable element. (C, D) Models of secondary structure for Arabidopsis (C) At2g07698 and (D) At4g02510 transcripts determined by default (unconstrained) or dsRNA-seq constrained parameters for RNAfold (screenshots from the structural viewer at http://tesla.pcbi.upenn.edu/annoj_at9/). The sequences interrogated in (E) (At2g07698 #1 and At4g02510) are highlighted in yellow. The scale bar between the two models indicates the read counts that are normalized by the length of sequenced bases for the transcript. Black arrows indicate RNA loops that are >5 nt within the yellow shaded portions of the models. G denotes the Gibb's free energy value (kilocalories/mole) for the corresponding RNA secondary structure model. (E) Random-primed RT-PCR analysis of dsRNA ‘hotspots’ from At5g56070, At2g07698 (2), At4g02510, At5g13630, and At5g02500 after treatment of total RNA samples with either a single-stranded (ss) or double-strand RNase (ds). Samples that were not treated with reverse transcriptase (RT -) or either RNase (-) serve as controls for this experiment. (F–H) Models of secondary structure for Arabidopsis (D) chr4_h76 (chr4: nt 1476284–1476589), (E) chrM_h20 (chrM: nt 46875–47251), and (F) chrM_h95 (chrM: nt 334344–334833) novel intergenic transcripts determined by dsRNA-seq constrained parameters for RNAfold (screenshots from the structural viewer at http://tesla.pcbi.upenn.edu/annoj_at9/). The scale bar to the left (F, G) or right (H) of each model indicates the read counts that are normalized by the length of sequenced bases for the transcript. G denotes the Gibb's free energy value (kilocalories/mole) for the corresponding RNA secondary structure model.

Similar articles

Cited by

References

    1. Brierley I, Pennell S, Gilbert RJ. Viral RNA pseudoknots: versatile motifs in gene expression and replication. Nat Rev Microbiol. 2007;5:598–610. - PMC - PubMed
    1. Cooper TA, Wan L, Dreyfuss G. RNA and disease. Cell. 2009;136:777–793. - PMC - PubMed
    1. Cruz JA, Westhof E. The dynamic landscapes of RNA architecture. Cell. 2009;136:604–609. - PubMed
    1. Mendell JT, Dietz HC. When the message goes awry: disease-producing mutations that influence mRNA content and performance. Cell. 2001;107:411–414. - PubMed
    1. Montange RK, Batey RT. Riboswitches: emerging themes in RNA structure and function. Annu Rev Biophys. 2008;37:117–133. - PubMed

Publication types

MeSH terms