Automated filtering of genome-wide large deletions through an ensemble deep learning framework
- PMID: 36038049
- DOI: 10.1016/j.ymeth.2022.08.001
Automated filtering of genome-wide large deletions through an ensemble deep learning framework
Abstract
Computational methods based on whole genome linked-reads and short-reads have been successful in genome assembly and detection of structural variants (SVs). Numerous variant callers that rely on linked-reads and short reads can detect genetic variations, including SVs. A shortcoming of existing tools is a propensity for overestimating SVs, especially for deletions. Optimizing the advantages of linked-read and short-read sequencing technologies would thus benefit from an additional step to effectively identify and eliminate false positive large deletions. Here, we introduce a novel tool, AquilaDeepFilter, aiming to automatically filter genome-wide false positive large deletions. Our approach relies on transforming sequencing data into an image and then relying on convolutional neural networks to improve classification of candidate deletions as such. Input data take into account multiple alignment signals including read depth, split reads and discordant read pairs. We tested the performance of AquilaDeepFilter on five linked-reads and short-read libraries sequenced from the well-studied NA24385 sample, validated against the Genome in a Bottle benchmark. To demonstrate the filtering ability of AquilaDeepFilter, we utilized the SV calls from three upstream SV detection tools including Aquila, Aquila_stLFR and Delly as the baseline. We showed that AquilaDeepFilter increased precision while preserving the recall rate of all three tools. The overall F1-score improved by an average 20% on linked-reads and by an average of 15% on short-read data. AquilaDeepFilter also compared favorably to existing deep learning based methods for SV filtering, such as DeepSVFilter. AquilaDeepFilter is thus an effective SV refinement framework that can improve SV calling for both linked-reads and short-read data.
Keywords: Convolutional neural networks; Deep learning; Ensemble method; Linked-reads; Short reads; Structural variants.
Copyright © 2022 The Author(s). Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
CSV-Filter: a deep learning-based comprehensive structural variant filtering method for both short and long reads.Bioinformatics. 2024 Sep 2;40(9):btae539. doi: 10.1093/bioinformatics/btae539. Bioinformatics. 2024. PMID: 39240375 Free PMC article.
-
A deep learning approach for filtering structural variants in short read sequencing data.Brief Bioinform. 2021 Jul 20;22(4):bbaa370. doi: 10.1093/bib/bbaa370. Brief Bioinform. 2021. PMID: 33378767
-
Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads.Bioinform Adv. 2021 Jun 16;1(1):vbab007. doi: 10.1093/bioadv/vbab007. eCollection 2021. Bioinform Adv. 2021. PMID: 36700103 Free PMC article.
-
Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11. Methods Mol Biol. 2023. PMID: 36335499 Review.
-
Detection of somatic structural variants from short-read next-generation sequencing data.Brief Bioinform. 2021 May 20;22(3):bbaa056. doi: 10.1093/bib/bbaa056. Brief Bioinform. 2021. PMID: 32379294 Free PMC article. Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources