Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 18;18(1):264.
doi: 10.1186/s12859-017-1679-8.

The rainfall plot: its motivation, characteristics and pitfalls

Affiliations

The rainfall plot: its motivation, characteristics and pitfalls

Diana Domanska et al. BMC Bioinformatics. .

Abstract

Background: A visualization referred to as rainfall plot has recently gained popularity in genome data analysis. The plot is mostly used for illustrating the distribution of somatic cancer mutations along a reference genome, typically aiming to identify mutation hotspots. In general terms, the rainfall plot can be seen as a scatter plot showing the location of events on the x-axis versus the distance between consecutive events on the y-axis. Despite its frequent use, the motivation for applying this particular visualization and the appropriateness of its usage have never been critically addressed in detail.

Results: We show that the rainfall plot allows visual detection even for events occurring at high frequency over very short distances. In addition, event clustering at multiple scales may be detected as distinct horizontal bands in rainfall plots. At the same time, due to the limited size of standard figures, rainfall plots might suffer from inability to distinguish overlapping events, especially when multiple datasets are plotted in the same figure. We demonstrate the consequences of plot congestion, which results in obscured visual data interpretations.

Conclusions: This work provides the first comprehensive survey of the characteristics and proper usage of rainfall plots. We find that the rainfall plot is able to convey a large amount of information without any need for parameterization or tuning. However, we also demonstrate how plot congestion and the use of a logarithmic y-axis may result in obscured visual data interpretations. To aid the productive utilization of rainfall plots, we demonstrate their characteristics and potential pitfalls using both simulated and real data, and provide a set of practical guidelines for their proper interpretation and usage.

Keywords: Genomics; Mutation; Rainfall plot; Visualization.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Visualization of SPMs in a pancreatic cancer patient. SPMs (marked with vertical black lines) at location 145,800,000−146,100,000 of chromosome 3 of the hg19 human reference genome (a). A frequency line plot (b) and RP (c) showing kataegis regions at location ∼150 Mbp in chromosome 3 (corresponding to whole-genome location ∼650 Mbp in the figure) and at location ∼35 Mbp in chromosome 11 (corresponding to whole-genome location ∼1850 Mbp in the figure). Data taken from [2]
Fig. 2
Fig. 2
The same kataegis region with different bin sizes. Individual mutation locations within the bin containing a kataegis region for bin size 30Kbps (a), 3Mbps (c) and 30Mbps (e), as well as line plots showing the average density of mutations along those same regions in (b), (d) and (f), respectively. Data taken from [2]
Fig. 3
Fig. 3
Density and frequency of mutations along the genome. Simulated data with four hotspot regions. The first and second region have the same inter-mutation value, equal to 0.001, while the third and fourth inter-mutational value is in both cases equal to 0.01. The first and third region share the same genomic regions, and so do the second and fourth region
Fig. 4
Fig. 4
Clustering of mutations as an HMP. HMP consisting of two states (a), with a corresponding rainfall pattern that such a process would give rise to (b) and a corresponding histogram of inter-event distances (c). In this example of a hidden Markov process, there is a high probability (P=0.8) of being in or moving to the state with low intra-hotspot distance (λ=0.01), which generates closely spaced events. A more seldomly occurring state (P=0.2) with large inter-hotspot distance (λ=0.0001) generates events with large distance to their preceding neighbors. The process defined by these two states generates several distantly spaced hotspots of events, giving rise to two quite distinct horizontal bands in the RP. The same pattern can also be seen as two distinct peaks in the histogram of inter-event distances
Fig. 5
Fig. 5
The extent of possible event congestion on a human genome RP with dimensions of 1000x351 pixels. The number of distinct represented inter-event distances and the number of possibly overlapping events are displayed for selected RP y-coordinates. RP pixels with low y-coordinates represent few or even no inter-event distances (as seen for y-coordinates 0, 25, 50). At the same time, the distances represented by low y-coordinates are short and allow therefore for a high number of events to share the same x-coordinate. On the other hand, individual pixels with high y-coordinates can represent groups of many distinct inter-event distances (i.e., hundreds, thousands or millions of distances at a time). At the same time, higher y-coordinates represent longer distances, which increasingly limits the number of events that could possibly share the same x-coordinate (with only a single event fitting onto any x-coordinate at the highest y-values). Changing the plot dimensions will influence which distances and which genomic locations will be distinguishable from each other
Fig. 6
Fig. 6
The extent of event-overlap („congestion”) in data. Congestion in the pancreatic cancer data (a) and breast cancer data (b) used for illustrating kataegis in [2]. The events were projected onto an RP pixel grid with dimensions of 1000x351 pixels
Fig. 7
Fig. 7
Pancreatic cancer variants. Variant are plotted in an order based on their genomic location (a) and projected on a 1000x351 pixel RP grid with higlighted sites of congestions (b)
Fig. 8
Fig. 8
Pancreatic cancer variants. Variants are plotted in an order based on the substitution type, with either C>A variants were plotted first (a) or last (b). Although based on the same data and using a shared color scheme, the two plots give very different impressions of which type of mutation is the most prevalent one

Similar articles

Cited by

References

    1. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149(5–10):979–93. doi: 10.1016/j.cell.2012.04.024. - DOI - PMC - PubMed
    1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415–21. doi: 10.1038/nature12477. - DOI - PMC - PubMed
    1. Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun. 2015;6:10001. doi: 10.1038/ncomms10001. - DOI - PMC - PubMed
    1. Beà S, Valdés-Mas R, Navarro A, Salaverria I, Martín-Garcia D, et al. Landscape of somatic mutations and clonal evolution in mantle cell lymphoma. Proc Nat Acad Sci USA. 2013;110(45):18250–5. doi: 10.1073/pnas.1314608110. - DOI - PMC - PubMed
    1. Cooper CS, Eeles R, Wedge DC, Van Loo P, Gundem G, Alexandrov LB, et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat Genet. 2015;47(4):367–72. doi: 10.1038/ng.3221. - DOI - PMC - PubMed

LinkOut - more resources