Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 9:15:1435087.
doi: 10.3389/fgene.2024.1435087. eCollection 2024.

HapKled: a haplotype-aware structural variant calling approach for Oxford nanopore sequencing data

Affiliations

HapKled: a haplotype-aware structural variant calling approach for Oxford nanopore sequencing data

Zhendong Zhang et al. Front Genet. .

Abstract

Introduction: Structural Variants (SVs) are a type of variation that can significantly influence phenotypes and cause diseases. Thus, the accurate detection of SVs is a vital part of modern genetic analysis. The advent of long-read sequencing technology ushers in a new era of more accurate and comprehensive SV calling, and many tools have been developed to call SVs using long-read data. Haplotype-tagging is a procedure that can tag haplotype information on reads and can thus potentially improve the SV detection; nevertheless, few methods make use of this information. In this article, we introduce HapKled, a new SV detection tool that can accurately detect SVs from Oxford Nanopore Technologies (ONT) long-read alignment data. Methods: HapKled utilizes haplotype information underlying alignment data by conducting haplotype-tagging using Whatshap on the reads to improve the detection performance, with three unique calling mechanics including altering clustering conditions according to haplotype information of signatures, determination of similar SVs based on haplotype information, and slack filtering conditions based on haplotype quality. Results: In our evaluations, HapKled outperformed state-of-the-art tools and can deliver better SV detection results on both simulated and real sequencing data. The code and experiments of HapKled can be obtained from https://github.com/CoREse/HapKled. Discussion: With the superb SV detection performance that HapKled can deliver, HapKled could be useful in bioinformatics research, clinical diagnosis, and medical research and development.

Keywords: Oxford nanopore sequencing; haplotype-tagging; long-read sequencing; structural variant; variant calling.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Overview of HapKled procedures. Part 1: the input alignment file is first used to call SNVs using Clair3, and then HapKled uses the detection result to haplotype-tag the alignment file using WhatsHap. Part 2: with the haplotype-tagged reads generated in Part 1, HapKled uses a haplotype-aware version of kled with three improvements, i.e., applying different conditions when clustering, distinguishing similar nearby SVs based on per-haplotype statistics, and adjusting filtering parameters based on haplotype-tagging quality, to generate the final VCF.
FIGURE 2
FIGURE 2
Benchmark experiment results on the simulated dataset. The vertical axes denote the F1 scores for presence or genotype. The subfigures include (A) the overall comparisons of presence F1 and GT-F1 of the tools and the comparisons of presence F1 and GT-F1 for (B) deletion, (C) insertion, (D) duplication, and (E) inversion.
FIGURE 3
FIGURE 3
Benchmark experiment results on the HG002 ONT data. The vertical axes denote the F1 scores for presence or genotype. The subfigures include (A) the overall comparisons of presence F1; the comparisons of presence F1 for (B) deletion and (C) insertion; (D) the overall comparisons of GT-F1; the comparisons of GT-F1 for (E) deletion and (F) insertion.

Similar articles

References

    1. Ahsan M. U., Liu Q., Perdomo J. E., Fang L., Wang K. (2023). A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat. Methods 20, 1143–1158. 10.1038/s41592-023-01932-w - DOI - PMC - PubMed
    1. Auton A., Abecasis G. R., Altshuler D. M., Durbin R. M., Abecasis G. R., Bentley D. R., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
    1. Bennett E. P., Petersen B. L., Johansen I. E., Niu Y., Yang Z., Chamberlain C. A., et al. (2020). INDEL detection, the ‘Achilles heel’ of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels. Nucleic Acids Res. 48, 11958–11981. 10.1093/nar/gkaa975 - DOI - PMC - PubMed
    1. Bolognini D., Sanders A., Korbel J. O., Magi A., Benes V., Rausch T. (2020). VISOR: a versatile haplotype-aware structural variant simulator for short- and long-read sequencing. Bioinformatics 36, 1267–1269. 10.1093/bioinformatics/btz719 - DOI - PubMed
    1. Chen X., Schulz-Trieglaff O., Shaw R., Barnes B., Schlesinger F., Källberg M., et al. (2016). Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222. 10.1093/bioinformatics/btv710 - DOI - PubMed

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work has been supported by the National Natural Science Foundation of China (Grant numbers: 62331012, 32000467), Heilongjiang Province Science and Technology Plan Project (Grant number: 2022ZX02C20), and Natural Science Foundation of Heilongjiang Province (Grant number: LH2023F014).

LinkOut - more resources