JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data

doi:10.1093/bioinformatics/bts053

. 2012 Apr 1;28(7):907-13.

doi: 10.1093/bioinformatics/bts053. Epub 2012 Jan 27.

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data

Andrew Roth¹, Jiarui Ding, Ryan Morin, Anamaria Crisan, Gavin Ha, Ryan Giuliany, Ali Bashashati, Martin Hirst, Gulisa Turashvili, Arusha Oloumi, Marco A Marra, Samuel Aparicio, Sohrab P Shah

Affiliations

PMID: 22285562
PMCID: PMC3315723
DOI: 10.1093/bioinformatics/bts053

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data

Andrew Roth et al. Bioinformatics. 2012.

. 2012 Apr 1;28(7):907-13.

doi: 10.1093/bioinformatics/bts053. Epub 2012 Jan 27.

Authors

Andrew Roth¹, Jiarui Ding, Ryan Morin, Anamaria Crisan, Gavin Ha, Ryan Giuliany, Ali Bashashati, Martin Hirst, Gulisa Turashvili, Arusha Oloumi, Marco A Marra, Samuel Aparicio, Sohrab P Shah

Affiliation

¹ Department of Molecular Oncology, BC Cancer Agency, BC, Canada.

PMID: 22285562
PMCID: PMC3315723
DOI: 10.1093/bioinformatics/bts053

Abstract

Motivation: Identification of somatic single nucleotide variants (SNVs) in tumour genomes is a necessary step in defining the mutational landscapes of cancers. Experimental designs for genome-wide ascertainment of somatic mutations now routinely include next-generation sequencing (NGS) of tumour DNA and matched constitutional DNA from the same individual. This allows investigators to control for germline polymorphisms and distinguish somatic mutations that are unique to the tumour, thus reducing the burden of labour-intensive and expensive downstream experiments needed to verify initial predictions. In order to make full use of such paired datasets, computational tools for simultaneous analysis of tumour-normal paired sequence data are required, but are currently under-developed and under-represented in the bioinformatics literature.

Results: In this contribution, we introduce two novel probabilistic graphical models called JointSNVMix1 and JointSNVMix2 for jointly analysing paired tumour-normal digital allelic count data from NGS experiments. In contrast to independent analysis of the tumour and normal data, our method allows statistical strength to be borrowed across the samples and therefore amplifies the statistical power to identify and distinguish both germline and somatic events in a unified probabilistic framework.

Availability: The JointSNVMix models and four other models discussed in the article are part of the JointSNVMix software package available for download at http://compbio.bccrc.ca

Contact: sshah@bccrc.ca

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Hypothetical example of the JointSNVMix analysis process. Reads are first aligned to the reference genome (green). Next the allelic counts, which are the number of matches and depth of reads at each position are tabulated. Allelic count information can then be used to identify germline (blue) and somatic positions (red). At the bottom of the Figure, we show the hypothetical probabilities of the nine joint genotypes based on the count data for the somatic position (AA, AB).

**Fig. 2.**
Probabilistic graphical model representing the (a) JointSNVMix1 and (b) JointSNVMix2 model. Shaded nodes represent observed values or fixed values, while the values of unshaded nodes are learned using EM. Only the distributions for the normal are shown below, the tumour distributions are the same. We have defined f(q|a, z)=z[qa+(1 − q)(1 − a)]+0.5(1 − z) and g(r|z)=zr+(1 − z)(1 − r). Description of all random variables is given in Table 2.

**Fig. 3.**
Concordance analysis of the 12 DLBCL datasets. The Somatic column represents concordance with the merged COSMIC and ground truth set. The germline column represents concordance with the 1000 Genomes positions with the cosmic positions removed. The horizontal axis shows the number of somatic predictions made and the vertical axes shows the fraction of those predictions found to be in the respective set. Lines are drawn by computing concordance as the threshold for classification is lowered. Lines start always from the left side because multiple positions may have ℙ(*Somatic*)=1. Circles at the start of lines indicate this positions, these points are also labelled with the number of somatic predictions (in 1000's) and concordance.

See this image and copyright information in PMC

Cited by

De novo somatic mutations in components of the PI3K-AKT3-mTOR pathway cause hemimegalencephaly.
Lee JH, Huynh M, Silhavy JL, Kim S, Dixon-Salazar T, Heiberg A, Scott E, Bafna V, Hill KJ, Collazo A, Funari V, Russ C, Gabriel SB, Mathern GW, Gleeson JG. Lee JH, et al. Nat Genet. 2012 Jun 24;44(8):941-5. doi: 10.1038/ng.2329. Nat Genet. 2012. PMID: 22729223 Free PMC article.
MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine.
Wang C, Liang C. Wang C, et al. Sci Rep. 2018 Dec 3;8(1):17546. doi: 10.1038/s41598-018-35682-z. Sci Rep. 2018. PMID: 30510242 Free PMC article.
Identification of Somatic Mutations From Bulk and Single-Cell Sequencing Data.
Huang AY, Lee EA. Huang AY, et al. Front Aging. 2022 Jan 3;2:800380. doi: 10.3389/fragi.2021.800380. eCollection 2021. Front Aging. 2022. PMID: 35822012 Free PMC article. Review.
Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals.
Huang AY, Xu X, Ye AY, Wu Q, Yan L, Zhao B, Yang X, He Y, Wang S, Zhang Z, Gu B, Zhao HQ, Wang M, Gao H, Gao G, Zhang Z, Yang X, Wu X, Zhang Y, Wei L. Huang AY, et al. Cell Res. 2014 Nov;24(11):1311-27. doi: 10.1038/cr.2014.131. Epub 2014 Oct 14. Cell Res. 2014. PMID: 25312340 Free PMC article.
A cancer cell-line titration series for evaluating somatic classification.
Denroche RE, Mullen L, Timms L, Beck T, Yung CK, Stein L, McPherson JD, Brown AM. Denroche RE, et al. BMC Res Notes. 2015 Dec 26;8:823. doi: 10.1186/s13104-015-1803-7. BMC Res Notes. 2015. PMID: 26708082 Free PMC article.

See all "Cited by" articles

References

1. Berger M.F., et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–220. - PMC - PubMed
1. Campbell P.J., et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010;467:1109–1113. - PMC - PubMed
1. DePristo M., et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. - PMC - PubMed
1. Ding L., et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. - PMC - PubMed
1. Ding J., et al. Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics. 2012;28:167–175. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

202452/Canadian Institutes of Health Research/Canada

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Berger M.F., et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–220. - PMC - PubMed

[2] Berger M.F., et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–220. - PMC - PubMed

[3] Campbell P.J., et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010;467:1109–1113. - PMC - PubMed

[4] Campbell P.J., et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010;467:1109–1113. - PMC - PubMed

[5] DePristo M., et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. - PMC - PubMed

[6] DePristo M., et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43:491–498. - PMC - PubMed

[7] Ding L., et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. - PMC - PubMed

[8] Ding L., et al. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. - PMC - PubMed

[9] Ding J., et al. Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics. 2012;28:167–175. - PMC - PubMed

[10] Ding J., et al. Feature based classifiers for somatic mutation detection in tumour-normal paired sequencing data. Bioinformatics. 2012;28:167–175. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data

Affiliation

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources