Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;42(6):e44.
doi: 10.1093/nar/gkt1381. Epub 2014 Jan 11.

An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

Affiliations

An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

Jorge Duitama et al. Nucleic Acids Res. 2014 Apr.

Abstract

Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Common interaction with NGSEP to call variants from aligned reads. (i) Right-click on a sorted SAM or BAM file, (ii) select the menu for NGSEP, (iii) select the option to call variants, (iv) select the reference genome (only the first time) and a prefix for the output files and (v) click on the find variants button.
Figure 2.
Figure 2.
Sensitivity (left panels) and FDR (right panels) for genotyping of SNVs using NGSEP (blue), GATK (red) and SAMtools (yellow) as a function of the minimum quality score on the following benchmark data sets: (A and B) yeast unselected pool, (C and D) high-coverage human sample NA12878 and (E and F) low-coverage human sample NA12878. Continuous lines represent homozygous genotype calls, and broken lines represent heterozygous genotype calls.
Figure 3.
Figure 3.
(A) Sensitivity and (B) FDR for genotyping of small indels produced by NGSEP (blue), GATK (red) and SAMtools (yellow) using reads aligned with BWA, and NGSEP (green) using reads aligned with Bowtie 2 on the yeast unselected pool as a function of the minimum quality score. Continuous lines represent homozygous genotype calls, and broken lines represent heterozygous genotype calls.
Figure 4.
Figure 4.
Quality assessment of the implementation of the CNVNator algorithm in NGSEP. Given the same GC-corrected intensities, the same genome size and the same RD distribution parameters, both implementations produce nearly the same (A) partition and (B) RD levels. Examples of repetitive regions in (C) the yeast parent ER7A and (D) the low-coverage human sample show how RD varies depending on the number of alignments counted for each read (blue: only the best alignment of each read counted; red: up to three alignments counted; yellow: all alignments found with Bowtie 2 with the -a option counted). Sensitivity of NGSEP, CNVnator and BreakDancer to identify (E) deletions and (F) duplications validated by Mills and collaborators (22) using reads from the low-coverage data set for NA12878. Default and K = 3 modes of Bowtie 2 are compared.
Figure 5.
Figure 5.
Venn diagrams comparing the variants discovered by NGSEP (blue), GATK (red) and SAMtools (yellow) on the data set of reads obtained after sequencing rice cultivar IR8. The upper diagram compares homozygous nonreference calls among the three methods. Smaller circles within each category represent sites that were called heterozygous by at least one method and homozygous nonreference by at least another method (for example, 2116 variants called homozygous nonreference by GATK and SAMtools were called heterozygous by NGSEP). The smaller diagram at the bottom compares heterozygous calls that were not called homozygous nonreference by any of the three methods.

Similar articles

Cited by

References

    1. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. - PMC - PubMed
    1. Zhang J, Wu G, Miller CP, Tatevossian RG, Dalton JD, Tang B, Orisme W, Punchihewa C, Parker M, Qaddoumi I, et al. Whole-genome sequencing identifies genetic alterations in pediatric low-grade gliomas. Nat. Genet. 2013;45:602–612. - PMC - PubMed
    1. Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, Dong Y, Gutenkunst RN, Fang L, Huang L, et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 2012;30:105–111. - PubMed
    1. Hubmann G, Foulquié-Moreno MR, Nevoigt E, Duitama J, Meurens N, Pais TM, Mathé L, Saerens S, Nguyen HT, Swinnen S, et al. Quantitative trait analysis of yeast biodiversity yields novel gene tools for metabolic engineering. Metab. Eng. 2013;17:68–81. - PubMed
    1. Pabinger S, Dander A, Fisher M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief. Bioinform. 2013 (Epub ahead of print.) - PMC - PubMed

Publication types