Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 1;40(20):e160.
doi: 10.1093/nar/gks697. Epub 2012 Jul 28.

Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data

Affiliations

Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data

Cem Sievers et al. Nucleic Acids Res. .

Abstract

The Photo-Activatable Ribonucleoside-enhanced CrossLinking and ImmunoPrecipitation (PAR-CLIP) method was recently developed for global identification of RNAs interacting with proteins. The strength of this versatile method results from induction of specific T to C transitions at sites of interaction. However, current analytical tools do not distinguish between non-experimentally and experimentally induced transitions. Furthermore, geometric properties at potential binding sites are not taken into account. To surmount these shortcomings, we developed a two-step algorithm consisting of a non-parametric two-component mixture model and a wavelet-based peak calling procedure. Our algorithm can reduce the number of false positives up to 24% thereby identifying high confidence interaction sites. We successfully employed this approach in conjunction with a modified PAR-CLIP protocol to study the functional role of nuclear Moloney leukemia virus 10, a putative RNA helicase interacting with Argonaute2 and Polycomb. Our method, available as the R package wavClusteR, is generally applicable to any substitution-based inference problem in genomics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
PAR-CLIP induces transitions at specific frequencies. MOV10 PAR-CLIP data analysis: (a, b) Absolute number of genomic sites exhibiting specified substitutions within the RSF intervals specified in the figure. To obtain reasonable estimates on RSFs, only genomic positions of coverage ≥20 were considered. (c) All clusters were ranked according to the absolute number of T to C transitions. Vertical axis represents the number of clusters containing high-TC sites (a) normalized to the total number of clusters being considered (horizontal axis). (d) Example of a high-TC site likely to be the result of external RNA contamination. Genomic position is indicated on top. Total number of all aligned reads (in brackets) and observed T to C transitions are shown below. Experimental sources are indicated on the left. PAR-CLIP: reads obtained from MOV10 PAR-CLIP experiments. Genomic DNA: reads obtained from pooling multiple RNA-Seq and ChIP-Seq experiments performed in the same cell line (unpublished data). Since no substitutions are induced experimentally, majority of reads correspond to the actual genomic sequence and can be used to determine SNPs. RNA-Seq: reads obtained from nuclear RNA-Seq control experiments (Supplementary Materials and Methods). (e) Example of a high-TC site likely to be the result of a HEK293-specific SNP. Annotations are the same as in Figure 1d.
Figure 2.
Figure 2.
Model fit and wavelet peak calling on the MOV10 PAR-CLIP data. (a) The count function f(s) represents the absolute number of genomic positions exhibiting at least one substitution. A minimum coverage c = 20 was required. (b) The densities estimates [Equation (3)] as well as the log-odds ratio [Equation (5)] which were estimated from the data. Vertical axis represents density as well as log-odds ratio, respectively. (c) Estimated posterior probability [Equation (4)] of a given observation belonging to class 2 (experimentally induced transition) computed from the estimated densities and mixing coefficients. (d) Coverage function observed within the 3′UTR of the HOXC4 gene. Ticks (green) below indicate high confidence T − >C transitions as determined by the mixture model. Red circles indicate wavelet peaks, horizontal lines (blue) below represent the clusters. (e) Continuous wavelet transform of the coverage function shown in Figure 2d, color coding and the corresponding coverage at each position is indicated above.
Figure 3.
Figure 3.
Comparison of CLIPZ cluster and wavClusters. (a) Number of wavClusters overlapping with each 841 CLIPZ cluster. (b) Length distribution of the 841 out of the top 1000 CLIPZ clusters and the overlapping 2455 wavCluster. (c) Exemplary genome browser view corresponding to the genomic location of the 3′UTR of a the SFRS6 gene. MOV10 coverage function, obtained from the PAR-CLIP data, is shown as ‘Coverage’. Tracks below indicate positions exhibiting at least one T to C transition, high confidence interaction sites determined by mixture model, the top 17053 CLIPZ clusters, wavClusters and the genomic annotation. (d) Broader genome browser view. Labeling is the same as in Figure 3c. Red arrows indicate false positives called by CLIPZ, as they localize to untranscribed regions.
Figure 4.
Figure 4.
Analysis of MOV10 and AGO2 wavClusters. (a) Genomic annotation of MOV10 wavClusters. Only wavCluster, which unambiguously localize to one genomic feature only were considered. (b) Length distribution of MOV10 wavClusters, mean length = 35.7 bases. (c–e) MEME results: logos of the top three motifs ranked increasingly by E-values (1.1e − 74, 6.3e − 47 and 1.2e − 25, respectively). Vertical axis represents information content in bits. The distribution of T to C RSF values for a given position over all motif occurrences is shown above each motif logo. (f) Venn diagram indicating the number of genes bound by AGO2, MOV10 or both proteins. Overlap is significant according to hypergeometric testing (P-value < 2.2e − 16). (g) Bootstrap distribution of mean wavCluster center difference of overlapping AGO2 and MOV10 wavCluster. 10 000 bootstrap samples were computed. Dashed vertical lines (−1.95, 1.38) indicate bootstrap confidence intervals considering a significance level of 0.001. Solid line represents the kernel density estimate. (h) Co-immunoprecipitation in nuclear extract of HEK293 cells (Input). MOV10 antibody was used for the immunoprecipitation. Western blotting was done using AGO2 antibody. Samples were prepared with and without RNase treatment as indicated in the figure.

Similar articles

Cited by

References

    1. Jackson RJ, Hellen CUT, Pestova TV. The mechanism of eukaryotic translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Bio. 2010;11:113–127. - PMC - PubMed
    1. Matlin AJ, Clark F, Smith CWJ. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Bio. 2005;6:386–398. - PubMed
    1. Ghildiyal M, Zamore PD. Small silencing RNAs: an expanding universe. Nat. Rev. Genet. 2009;10:94–108. - PMC - PubMed
    1. Konig J, Zarnack K, Luscombe NM, Ule J. Protein-RNA interactions: new genomic technologies and perspectives. Nat. Rev. Genet. 2012;13:77–83. - PubMed
    1. Wang KC, Chang HY. Molecular mechanisms of long noncoding RNAs. Cell. 2011;43:904–914. - PMC - PubMed

Publication types

Associated data