Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Feb 1;41(3):e46.
doi: 10.1093/nar/gks1143. Epub 2012 Dec 5.

A population model for genotyping indels from next-generation sequence data

Affiliations

A population model for genotyping indels from next-generation sequence data

Haojing Shao et al. Nucleic Acids Res. .

Abstract

Insertion and deletion polymorphisms (indels) are an important source of genomic variation in plant and animal genomes, but accurate genotyping from low-coverage and exome next-generation sequence data remains challenging. We introduce an efficient population clustering algorithm for diploids and polyploids which was tested on a dataset of 2000 exomes. Compared with existing methods, we report a 4-fold reduction in overall indel genotype error rates with a 9-fold reduction in low coverage regions.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Illustration of population clustering method on real data. (A–D) Clustering at different putative indel sites, with different depth of coverage, as well as site-specific error rates. Each point represents the total number of aligned reads (X-axis), as well as the number of indel aligned reads (Y-axis) for each individual in the population. Shapes indicate the genotype called by SOAP-popIndel: squares, circles and triangles indicate homozygous reference, heterozygous and homozygous indels, respectively. (A and C) low-to-medium depth of coverage, low error rate. Panel B: medium-to-high depth of coverage, low error rate. (D) low-to-medium depth of coverage, high error rate.
Figure 2.
Figure 2.
Genotyping accuracy and missing rates. Dashed-line, solid line, circles and diamonds represent SOAP-popIndel, Dindel, SAMTools and piCALL, respectively. Black: real exome data; Red: 4× simulation; Green: 20× simulation and Blue: 40× simulation. Lines for Dindel and SOAP-popIndel are based on posterior probability thresholds between 0.90 and 0.99. SAMTools and piCALL do not report probability of assignment, so are represented by a single point. (A) Results on 44 Sequenom validated sites. (B) Restricted to sites within samples that had <5× coverage. (C) Results on simulated data.

Similar articles

Cited by

References

    1. Li Y, Zheng H, Luo R, Wu H, Zhu H, Li R, Cao H, Wu B, Huang S, Shao H, et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat. Biotechnol. 2011;29:723–730. - PubMed
    1. Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2011;21:961–973. - PMC - PubMed
    1. Bansal V, Libiger O. A probabilistic method for the detection and genotyping of small indels from population-scale sequence data. Bioinformatics. 2011;27:2047–2053. - PMC - PubMed
    1. Le SQ, Durbin R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 2011;21:952–960. - PMC - PubMed
    1. Cao J, Schneeberger K, Ossowski S, Gunther T, Bender S, Fitz J, Koenig D, Lanz C, Stegle O, Lippert C, et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 2011;43:956–963. - PubMed

Publication types