SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies

doi:10.1093/bib/bbae336

. 2024 May 23;25(4):bbae336.

doi: 10.1093/bib/bbae336.

SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies

Heng Hu¹, Runtian Gao¹, Wentao Gao¹, Bo Gao², Zhongjun Jiang¹, Murong Zhou¹, Guohua Wang^{3

4}, Tao Jiang⁵

Affiliations

¹ College of Life Sciences, Northeast Forestry University, Harbin 150000, China.
² Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China.
³ College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China.
⁴ State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China.
⁵ School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China.

PMID: 38980375
PMCID: PMC11232458
DOI: 10.1093/bib/bbae336

SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies

Heng Hu et al. Brief Bioinform. 2024.

. 2024 May 23;25(4):bbae336.

doi: 10.1093/bib/bbae336.

Authors

Heng Hu¹, Runtian Gao¹, Wentao Gao¹, Bo Gao², Zhongjun Jiang¹, Murong Zhou¹, Guohua Wang^{3

4}, Tao Jiang⁵

Affiliations

¹ College of Life Sciences, Northeast Forestry University, Harbin 150000, China.
² Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin 150000, China.
³ College of Computer and Control Engineering, Northeast Forestry University, Harbin 150000, China.
⁴ State Key Laboratory of Tree Genetics and Breeding, Harbin 150000, China.
⁵ School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China.

PMID: 38980375
PMCID: PMC11232458
DOI: 10.1093/bib/bbae336

Abstract

Structural variation (SV) is an important form of genomic variation that influences gene function and expression by altering the structure of the genome. Although long-read data have been proven to better characterize SVs, SVs detected from noisy long-read data still include a considerable portion of false-positive calls. To accurately detect SVs in long-read data, we present SVDF, a method that employs a learning-based noise filtering strategy and an SV signature-adaptive clustering algorithm, for effectively reducing the likelihood of false-positive events. Benchmarking results from multiple orthogonal experiments demonstrate that, across different sequencing platforms and depths, SVDF achieves higher calling accuracy for each sample compared to several existing general SV calling tools. We believe that, with its meticulous and sensitive SV detection capability, SVDF can bring new opportunities and advancements to cutting-edge genomic research.

Keywords: deep learning; false-positives; long-read sequencing; structural variation detection.

PubMed Disclaimer

Figures

**Figure 1**
The overall workflow of SVDF. (1) SV signature collection: SVDF comprehensively collects and records various types of SV signatures from intra and inter alignments. (2) Signature clustering: a two-step adaptive clustering algorithm is employed to enhance the signals of true SV events. (3) SV filtering: a CNN is employed to filter false-positive SVs within the clustering results. (4) SV calling and genotype: generates the SV call set and allocates genotypes.

**Figure 2**
SV detection in simulated sample. (a) and (b) The F1 scores for calling SVs of various SV types by all tools, with CLR data depicted on the left and ONT data on the right. (c) The comparative performance of all tools under different minimum supporting reads thresholds. (d) The trend of precision and recall SV calling by SVDF under different minimum supporting read thresholds.

**Figure 3**
SV detection in the HG002 sample. (a) and (b) Comparison of overall calls F1 scores across Tier1 GIAB genome-wide SV (left) and CMRG benchmark (right). (c) Radar chart of MDR results for the Ashkenazi trio family, including MDR values for overall and different SV types. (d) and (e) The line plot displays the calls F1 scores of various SV sizes in CLR and ONT data by all tools, whilst the bar plot indicates the number of SVs called by SVDF across different SV size ranges. (f–h) Comparison of F1 scores in different down-sampled datasets are illustrated from left to right for CLR, CCS (HIFI), and ONT data.

**Figure 4**
SV detection in the CHM13 sample. (a) and (b) Precision–recall graphs of all tools for SV calling and genotyping in the CHM13 sample compared to the assembly based method (SVIM is hidden in genotyping because of its low performance). (c) The distribution of breakpoint deviation for SVs successfully called by all tools in the CHM13 sample, with the horizontal axis representing the number of SV calls and the vertical axis representing the range of breakpoint deviation.

**Figure 5**
SV detection in the HCC₁₃₉₅ cancer sample. (a) The number of successful calls of different types of SV by all tools compared to the HCC₁₃₉₅ cancer-specific SV benchmark set. (b) A Venn diagram indicates the overlap of validated SVs called in the HCC₁₃₉₅ cancer sample by SVDF and comparative tools, with numbers representing the count of overlaps.

See this image and copyright information in PMC

References

1. Mahmoud M, Gobet N, Cruz-Dávalos DI. et al. . Structural variant calling: the long and the short of it. Genome Biol 2019;20:1–14. 10.1186/s13059-019-1828-7. - DOI - PMC - PubMed
1. Rovelet-Lecrux A, Hannequin D, Raux G. et al. . APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet 2006;38:24–6. 10.1038/ng1718. - DOI - PubMed
1. The International SNP Map Working Group . A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928–33. 10.1038/35057149. - DOI - PubMed
1. Walters RG, Coin LJM, Ruokonen A. et al. . Rare genomic structural variants in complex disease: lessons from the replication of associations with obesity. PloS One 2013;8:e58048. 10.1371/journal.pone.0058048. - DOI - PMC - PubMed
1. Li Y, Roberts ND, Wala JA. et al. . Patterns of somatic structural variation in human cancer genomes. Nature 2020;578:112–21. 10.1038/s41586-019-1913-9. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Mahmoud M, Gobet N, Cruz-Dávalos DI. et al. . Structural variant calling: the long and the short of it. Genome Biol 2019;20:1–14. 10.1186/s13059-019-1828-7. - DOI - PMC - PubMed

[2] Mahmoud M, Gobet N, Cruz-Dávalos DI. et al. . Structural variant calling: the long and the short of it. Genome Biol 2019;20:1–14. 10.1186/s13059-019-1828-7. - DOI - PMC - PubMed

[3] Rovelet-Lecrux A, Hannequin D, Raux G. et al. . APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet 2006;38:24–6. 10.1038/ng1718. - DOI - PubMed

[4] Rovelet-Lecrux A, Hannequin D, Raux G. et al. . APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet 2006;38:24–6. 10.1038/ng1718. - DOI - PubMed

[5] The International SNP Map Working Group . A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928–33. 10.1038/35057149. - DOI - PubMed

[6] The International SNP Map Working Group . A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928–33. 10.1038/35057149. - DOI - PubMed

[7] Walters RG, Coin LJM, Ruokonen A. et al. . Rare genomic structural variants in complex disease: lessons from the replication of associations with obesity. PloS One 2013;8:e58048. 10.1371/journal.pone.0058048. - DOI - PMC - PubMed

[8] Walters RG, Coin LJM, Ruokonen A. et al. . Rare genomic structural variants in complex disease: lessons from the replication of associations with obesity. PloS One 2013;8:e58048. 10.1371/journal.pone.0058048. - DOI - PMC - PubMed

[9] Li Y, Roberts ND, Wala JA. et al. . Patterns of somatic structural variation in human cancer genomes. Nature 2020;578:112–21. 10.1038/s41586-019-1913-9. - DOI - PMC - PubMed

[10] Li Y, Roberts ND, Wala JA. et al. . Patterns of somatic structural variation in human cancer genomes. Nature 2020;578:112–21. 10.1038/s41586-019-1913-9. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies

Affiliations

SVDF: enhancing structural variation detect from long-read sequencing via automatic filtering strategies

Authors

Affiliations

Abstract

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources