Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 18:12:761791.
doi: 10.3389/fgene.2021.761791. eCollection 2021.

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Affiliations

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Davide Bolognini et al. Front Genet. .

Abstract

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Keywords: bioinformatics; genomics; long reads; nanopore sequencing; structural variation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Precision (y axis), recall (x axis) and F-score (dashed lines) of the high-quality SV callsets from Sniffles, SVIM, cuteSV, npInv and pbsv (hue palette) after minimap2 (top panels), NGMLR (mid panels) and lra (bottom panels) alignments. Results for both SV calling (left panels) and genotyping (right panels) in the NA24385 (circle symbol) and SI00001 (triangle symbol) datasets are shown.
FIGURE 2
FIGURE 2
Precision (y axis), recall (x axis) and F-score (dashed lines) of the SV callers Sniffles (square symbol), SVIM (cross symbol), cuteSV (circle symbol) and pbsv (triangle symbol) after minimap2 (top panels), NGMLR (mid panels) and lra (bottom panels) alignments. Results for both SV calling (left panels) and genotyping (right panels) are reported. The plot shows the influence of average genome coverage after down-sampling NA24385 alignments to different fractions (5X, 10X, 15X, 20X, 25X, 35X–hue palette) of the original coverage (total) on SV callers' performances.
FIGURE 3
FIGURE 3
Precision (y axis), recall (x axis) and F-score (dashed lines) of the SV callers Sniffles (square symbol), SVIM (cross symbol), cuteSV (circle symbol) and pbsv (triangle symbol) after minimap2 (top panels), NGMLR (mid panels) and lra (bottom panels) alignments. Results for both SV calling (left panels) and genotyping (right panels) are reported. The plot shows the influence of the number of reads minimally supporting a SV (2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50–hue palette) on SV callers' performances for the NA24385 dataset.
FIGURE 4
FIGURE 4
Precision (y axis), recall (x axis) and F-score (dashed lines) of the combination of the SV callers Sniffles, SVIM, cuteSV and pbsv (hue palette) after minimap2, NGMLR and lra alignments as well as after consensus generation (top-to-bottom panels). Results for both SV calling (left panels) and genotyping (right panels) are reported. The plot shows the influence of the integration of multiple high-quality callsets on reducing false positive calls in the NA24385 dataset.

Similar articles

Cited by

References

    1. Aganezov S., Goodwin S., Sherman R. M., Sedlazeck F. J., Arun G., Bhatia S., et al. (2020). Comprehensive Analysis of Structural Variants in Breast Cancer Genomes Using Single-Molecule Sequencing. Genome Res. 30, 1258–1273. 10.1101/gr.260497.119 - DOI - PMC - PubMed
    1. Alkan C., Coe B. P., Eichler E. E. (2011). Genome Structural Variation Discovery and Genotyping. Nat. Rev. Genet. 12, 363–376. 10.1038/nrg2958 - DOI - PMC - PubMed
    1. Audano P. A., Sulovari A., Graves-Lindsay T. A., Cantsilieris S., Sorensen M., Welch A. E., et al. (2019). Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 176, 663–675. 10.1016/j.cell.2018.12.019 - DOI - PMC - PubMed
    1. Beyter D., Ingimundardottir H., Oddsson A., Eggertsson H. P., Bjornsson E., Jonsson H., et al. (2020). Long Read Sequencing of 3,622 Icelanders Provides Insight into the Role of Structural Variants in Human Diseases and Other Traits. Cold Spring Harbor Laboratory. bioRxiv Available at: https://www.biorxiv.org/content/early/2020/12/14/848366.full.pdf . - PubMed
    1. Bolognini D., Magi A., Benes V., Korbel J. O., Rausch T. (2020). TRiCoLOR: Tandem Repeat Profiling Using Whole-Genome Long-Read Sequencing Data. GigaScience 9, giaa101. 10.1093/gigascience/giaa101 - DOI - PMC - PubMed

LinkOut - more resources