Abstract
SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Li, H. Preprint at http://arxiv.org/abs/1303.3997v2 (2013).
DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Dewey, F.E. et al. J. Am. Med. Assoc. 311, 1035–1045 (2014).
Faust, G.G. & Hall, I.M. Bioinformatics 30, 2503–2505 (2014).
Zook, J.M. et al. Nat. Biotechnol. 32, 246–251 (2014).
Garrison, E. & Marth, G. Preprint at http://arxiv.org/abs/1207.3907 (2012).
Kingsmore, S.F. & Saunders, C.J. Sci. Transl. Med. 3, 87ps23 (2011).
Cibulskis, K. et al. Nat. Biotechnol. 31, 213–219 (2013).
Larson, D.E. et al. Bioinformatics 28, 311–317 (2012).
Koboldt, D.C. et al. Genome Res. 22, 568–576 (2012).
Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
Alkan, C., Coe, B.P. & Eichler, E.E. Nat. Rev. Genet. 12, 363–376 (2011).
Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. Genome Biol. 15, R84 (2014).
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).
1000 Genomes Project Consortium. et al. Nature 467, 1061–1073 (2010).
1000 Genomes Project Consortium. et al. Nature 491, 56–65 (2012).
Paila, U., Chapman, B.A., Kirchner, R. & Quinlan, A.R. PLoS Comput. Biol. 9, e1003153 (2013).
Griffith, M. et al. Nat. Methods 10, 1209–1210 (2013).
Stransky, N., Cerami, E., Schalm, S., Kim, J.L. & Lengauer, C. Nat. Commun. 5, 4846 (2014).
Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Bioinformatics 10.1093/bioinformatics/btv098 (2015).
Tange, O. The USENIX Magazine 36, 42–47 (2011).
Cleary, J.G. et al. J. Comput. Biol. 21, 405–419 (2014).
Acknowledgements
The authors thank A. Abyzov for helpful discussions about CNVnator. This work was supported by US National Institutes of Health (NIH) training grant T32 GM007267 (C.C.), NIH NHGRI grant R01HG006693 (A.R.Q.), and NIH NHGRI center grant U54 HG003079, NIH New Innovator Award DP2OD006493-01 and a Burroughs Wellcome Fund Career Award (I.M.H.).
Author information
Authors and Affiliations
Contributions
C.C. wrote SpeedSeq and analyzed the data. R.M.L. advised on LUMPY implementation, G.G.F. contributed SAMBLASTER features, M.R.L. assisted with cloud implementation and D.B.R. parallelized CNVnator. E.P.G. and G.T.M. advised on implementing FreeBayes. A.R.Q. contributed GEMINI features and advised on software design. I.M.H. conceived and designed the study. C.C. and I.M.H. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Germline SNV detection performance
Receiver operating characteristic curves comparing the performance of three variant callers over the Omni microarray truth set (N=689,788).
Supplementary Figure 2 Somatic SNV detection performance of low frequency variants in a simulated tumor-normal pair
(a) Somatic variants in the simulated 50X tumor dataset (a mixture of 11 grandchildren from the CEPH 1463 pedigree) exhibit a range of variant allele frequencies in accordance with the expected binomial distribution. (b) Sensitivity and (c) precision over the range of variant allele frequencies at the quality thresholds of open circles in Fig. 2d.
Supplementary Figure 3 Structural variant validation by long-reads and 1000 Genomes Project data
SpeedSeq reported 6,696 structural variants (SVs) in the 50X NA12878 human dataset. The subsets of SVs with read-depth support from CNVnator (red) and with both paired-end and split-read support from LUMPY (blue) are displayed alongside the full set of reported variants (black) in each plot. Gray hashed lines denote the validation rate of 100 random permutations of the data. (a) Validation rate using deep (30X) long-read data from Pacific Biosciences or Illumina Moleculo technologies at different quality thresholds. (b) Validation rate of the subset of 3,438 deletions reported by SpeedSeq against deletions reported in the Pilot or Phase 1 callsets of the 1000 Genomes Project. (d) The number of SVs meeting each quality threshold and evidence type.
Supplementary Figure 4 CEPH 1463 family pedigree
Structure of the three-generation CEPH 1463 family pedigree used in evaluations of somatic variant detection, de novo variant detection, and structural variant detection.
Supplementary Figure 5 Construction of depth-based excluded regions and parallelization strategy
(a) Histogram of the aggregate coverage depth over the mappable genome for 17 whole genome datasets and one replicate sample from the Illumima Platinum Genomes project with a red vertical line denoting the coverage depth threshold for exclusion from SpeedSeq analysis. (b) The depth-based binning strategy whereby 34,123 static regions containing approximately equal numbers of reads are processed with a parallel implementation of FreeBayes. A horizontal red line shows that maximum coverage depth for a region to be processed by SpeedSeq.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–5, Supplementary Tables 1–5 and Supplementary Notes 1–4 (PDF 440 kb)
Supplementary Software
SpeedSeq v0.0.3a (ZIP 16441 kb)
Rights and permissions
About this article
Cite this article
Chiang, C., Layer, R., Faust, G. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods 12, 966–968 (2015). https://doi.org/10.1038/nmeth.3505
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3505
This article is cited by
-
Genomic variation in weedy and cultivated broomcorn millet accessions uncovers the genetic architecture of agronomic traits
Nature Genetics (2024)
-
Acute expression of human APOBEC3B in mice results in RNA editing and lethality
Genome Biology (2023)
-
Large-scale genome sequencing redefines the genetic footprints of high-altitude adaptation in Tibetans
Genome Biology (2023)
-
A comprehensive analysis of copy number variations in diverse apple populations
BMC Genomics (2023)
-
Pre-radiotherapy ctDNA liquid biopsy for risk stratification of oligometastatic non-small cell lung cancer
npj Precision Oncology (2023)