Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;39(2):393-402.
doi: 10.1093/nar/gkq792. Epub 2010 Sep 15.

Improving the accuracy of predicting secondary structure for aligned RNA sequences

Affiliations

Improving the accuracy of predicting secondary structure for aligned RNA sequences

Michiaki Hamada et al. Nucleic Acids Res. 2011 Jan.

Abstract

Considerable attention has been focused on predicting the secondary structure for aligned RNA sequences since it is useful not only for improving the limiting accuracy of conventional secondary structure prediction but also for finding non-coding RNAs in genomic sequences. Although there exist many algorithms of predicting secondary structure for aligned RNA sequences, further improvement of the accuracy is still awaited. In this article, toward improving the accuracy, a theoretical classification of state-of-the-art algorithms of predicting secondary structure for aligned RNA sequences is presented. The classification is based on the viewpoint of maximum expected accuracy (MEA), which has been successfully applied in various problems in bioinformatics. The classification reveals several disadvantages of the current algorithms but we propose an improvement of a previously introduced algorithm (CentroidAlifold). Finally, computational experiments strongly support the theoretical classification and indicate that the improved CentroidAlifold substantially outperforms other algorithms.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The MEA-based estimator (E1) with respect to Evaluation Process 1. We assume there exists a probability distribution p(θ|A) of the common secondary structures of the alignment A, and a gain function G(θ, y) between two secondary structure whose length is equal to the length of the alignment (y and θ are considered as the predicted structure and the reference structure, respectively). The gain function characterizes a similarity between the two secondary structures. The estimator is consistent with Evaluation Process 1 (Supplementary Figure S1). See Supplementary Section A.4.1 for details.
Figure 2.
Figure 2.
The MEA-based estimator (E2) with respect to Evaluation Process 2. We assume there exists a probability distribution px(θ|A) of common secondary structures of x for every x ∈ A and a gain function G(θ, y) between two secondary structure whose length is equal to the length of the alignment (y and θ are considered as the predicted structure and the reference structure, respectively). The estimator is consistent with Evaluation Process 2 (see Supplementary Figure S2). See Section A.4.2 in the supplementary information for details.
Figure 3.
Figure 3.
The performance of common secondary structure prediction with the reference alignments with respect to Evaluation Process 1. The horizontal and vertical axes indicate PPV and SEN, respectively. Better performances are in the upper-right areas of each figure (worse performances are to the lower left). The results for the RNAalipffold model are shown on the left and those for the Pfold model on the right. The labels ‘mc’, ‘ct’, ‘pf’ and ‘al’ indicate the McCaskill, CONTRAfold, Pfold and RNAalipffold models, respectively. CentroidAlifold (old: X) indicates CentroidAlifold with probability distribution X (where X = ‘mc’ or ‘ct’). CentroidAlifold (new: Y-X) indicates CentroidAlifold with a mixture of the probability distributions X and Y where Y is p(θ|A), X is p(θ|x) and w = 1/2 in Equation (2) (Y = ‘pf’ or ‘al’). The dashed lines (red/green) show the performance curves of the previous CentroidAlifold, while the solid lines (red/green) show the performance curves of the new CentroidAlifold. In both figures, the performances of PETfold and RNAalifold are also shown.
Figure 4.
Figure 4.
The performance of common secondary structure prediction for Evaluation Process 2 with alignments produced by ProbCons (left column) and the reference alignments (right column). In CentroidAlifold, we used the RNAalipffold model (top row) and the Pfold model (bottom row). See the caption of Figure 3 for notation. Also see Supplementary Figures S1–S3 for the performance with alignments produced by ClustalW (33), MAFFT (38) and MXSCARNA (39), respectively.
Figure 5.
Figure 5.
The performances of CentroidAlifold with various values of the weight parameter [i.e. w = 0,0.1,0.2, … ,0.9,1 in Equation (2)]. In this experiment, we used the mixture distribution of RNAalipffold model (12) and McCaskill model (18), and the alignments produced by ProbCons (34). The curves with w = 1 and w = 0 are equivalent to the ‘previous’ CentroidAlifold and RNAalipffold-Centroid, respectively. The results of the other combinations of probability distributions and aligners are shown in the Supplementary Data (Supplementary Figures S6–S10).

Similar articles

Cited by

References

    1. Bernhart SH, Hofacker IL. From consensus structure prediction to RNA gene finding. Brief. Funct. Genomic Proteomic. 2009;8:461–471. - PubMed
    1. Schroeder SJ. Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships. J. Virol. 2009;83:6326–6334. - PMC - PubMed
    1. Hofacker I, Fontana W, Stadler P, Bonhoeffer S, Tacker M, Schuster P. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 1994;125:167–188.
    1. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. - PMC - PubMed
    1. Clyde K, Harris E. RNA secondary structure in the coding region of dengue virus type 2 directs translation start codon selection and is required for viral replication. J. Virol. 2006;80:2170–2182. - PMC - PubMed

Publication types