Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data
- PMID: 30965134
- DOI: 10.1016/j.jbi.2019.103174
Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data
Abstract
Background: Whole genome sequencing (WGS) has increased in popularity and decreased in cost over the past decade, rendering this approach as a viable and sensitive method for variant detection. In addition to its utility for single nucleotide variant detection, WGS data has the potential to detect Copy Number Variants (CNV) to fine resolution. Many CNV detection software packages have been developed exploiting four main types of data: read pair, split read, read depth, and assembly based methods. The aim of this study was to evaluate the efficiency of each of these main approaches in detecting germline deletions.
Methods: WGS data and high confidence deletion calls for the individual NA12878 from the Genome in a Bottle consortium were the benchmark dataset. The performance of BreakDancer, CNVnator, Delly, FermiKit, and Pindel was assessed by comparing the accuracy and sensitivity of each software package in detecting deletions exceeding 1 kb.
Results: There was considerable variability in the outputs of the different WGS CNV detection programs. The best performance was seen from BreakDancer and Delly, with 92.6% and 96.7% sensitivity, respectively and 34.5% and 68.5% false discovery rate (FDR), respectively. In comparison, Pindel, CNVnator, and FermiKit were less effective with sensitivities of 69.1%, 66.0%, and 15.8%, respectively and FDR of 91.3%, 69.0%, and 31.7%, respectively. Concordance across software packages was poor, with only 27 of the total 612 benchmark deletions identified by all five methodologies.
Conclusions: The WGS based CNV detection tools evaluated show disparate performance in identifying deletions ≥1 kb, particularly those utilising different input data characteristics. Software that exploits read pair based data had the highest sensitivity, namely BreakDancer and Delly. BreakDancer also had the second lowest false discovery rate. Therefore, in this analysis read pair methods (BreakDancer in particular) were the best performing approaches for the identification of deletions ≥1 kb, balancing accuracy and sensitivity. There is potential for improvement in the detection algorithms, particularly for reducing FDR. This analysis has validated the utility of WGS based CNV detection software to reliably identify deletions, and these findings will be of use when choosing appropriate software for deletion detection, in both research and diagnostic medicine.
Keywords: Copy number variants; Molecular diagnostics; Structural variation; Whole genome sequencing.
Copyright © 2019 Elsevier Inc. All rights reserved.
Similar articles
-
Study on detection of CNVs using human whole genome bisulfite sequencing data.Yi Chuan. 2023 Apr 20;45(4):324-340. doi: 10.16288/j.yczz.22-385. Yi Chuan. 2023. PMID: 37077166
-
Performance of copy number variants detection based on whole-genome sequencing by DNBSEQ platforms.BMC Bioinformatics. 2020 Nov 11;21(1):518. doi: 10.1186/s12859-020-03859-x. BMC Bioinformatics. 2020. PMID: 33176676 Free PMC article.
-
Use of RAPTR-SV to Identify SVs from Read Pairing and Split Read Signatures.Methods Mol Biol. 2018;1833:143-153. doi: 10.1007/978-1-4939-8666-8_11. Methods Mol Biol. 2018. PMID: 30039370
-
Exome sequence read depth methods for identifying copy number changes.Brief Bioinform. 2015 May;16(3):380-92. doi: 10.1093/bib/bbu027. Epub 2014 Aug 28. Brief Bioinform. 2015. PMID: 25169955 Review.
-
Comparing CNV detection methods for SNP arrays.Brief Funct Genomic Proteomic. 2009 Sep;8(5):353-66. doi: 10.1093/bfgp/elp017. Epub 2009 Sep 8. Brief Funct Genomic Proteomic. 2009. PMID: 19737800 Review.
Cited by
-
Review of Computational Methods and Database Sources for Predicting the Effects of Coding Frameshift Small Insertion and Deletion Variations.ACS Omega. 2024 Jan 3;9(2):2032-2047. doi: 10.1021/acsomega.3c07662. eCollection 2024 Jan 16. ACS Omega. 2024. PMID: 38250421 Free PMC article. Review.
-
Whole-Genome Sequencing Can Identify Clinically Relevant Variants from a Single Sub-Punch of a Dried Blood Spot Specimen.Int J Neonatal Screen. 2023 Sep 21;9(3):52. doi: 10.3390/ijns9030052. Int J Neonatal Screen. 2023. PMID: 37754778 Free PMC article.
-
Integrating Genetic Structural Variations and Whole-Genome Sequencing Into Clinical Neurology.Neurol Genet. 2022 May 27;8(4):e200005. doi: 10.1212/NXG.0000000000200005. eCollection 2022 Aug. Neurol Genet. 2022. PMID: 37435434 Free PMC article. Review.
-
Systematic assessment of the contribution of structural variants to inherited retinal diseases.Hum Mol Genet. 2023 Jun 5;32(12):2005-2015. doi: 10.1093/hmg/ddad032. Hum Mol Genet. 2023. PMID: 36811936 Free PMC article.
-
Best practices for the interpretation and reporting of clinical whole genome sequencing.NPJ Genom Med. 2022 Apr 8;7(1):27. doi: 10.1038/s41525-022-00295-z. NPJ Genom Med. 2022. PMID: 35395838 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous