Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding
- PMID: 19546169
- PMCID: PMC2752135
- DOI: 10.1101/gr.091868.109
Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding
Abstract
We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.
Figures
Similar articles
-
U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line.PLoS Genet. 2010 Jan 29;6(1):e1000832. doi: 10.1371/journal.pgen.1000832. PLoS Genet. 2010. PMID: 20126413 Free PMC article.
-
De novo fragment assembly with short mate-paired reads: Does the read length matter?Genome Res. 2009 Feb;19(2):336-46. doi: 10.1101/gr.079053.108. Epub 2008 Dec 3. Genome Res. 2009. PMID: 19056694 Free PMC article.
-
Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.Am J Bot. 2012 Feb;99(2):186-92. doi: 10.3732/ajb.1100419. Epub 2012 Feb 1. Am J Bot. 2012. PMID: 22301893
-
Computational methods for discovering structural variation with next-generation sequencing.Nat Methods. 2009 Nov;6(11 Suppl):S13-20. doi: 10.1038/nmeth.1374. Nat Methods. 2009. PMID: 19844226 Review.
-
Whole genome sequencing.Methods Mol Biol. 2010;628:215-26. doi: 10.1007/978-1-60327-367-1_12. Methods Mol Biol. 2010. PMID: 20238084 Review.
Cited by
-
PAV markers in Sorghum bicolour: genome pattern, affected genes and pathways, and genetic linkage map construction.Theor Appl Genet. 2015 Apr;128(4):623-37. doi: 10.1007/s00122-015-2458-4. Epub 2015 Jan 30. Theor Appl Genet. 2015. PMID: 25634103 Free PMC article.
-
Exploring the implications of INDELs in neuropsychiatric genetics: challenges and perspectives.J Mol Neurosci. 2012 Jul;47(3):419-24. doi: 10.1007/s12031-012-9714-8. Epub 2012 Feb 16. J Mol Neurosci. 2012. PMID: 22350990
-
Telescoper: de novo assembly of highly repetitive regions.Bioinformatics. 2012 Sep 15;28(18):i311-i317. doi: 10.1093/bioinformatics/bts399. Bioinformatics. 2012. PMID: 22962446 Free PMC article.
-
Perrault syndrome is caused by recessive mutations in CLPP, encoding a mitochondrial ATP-dependent chambered protease.Am J Hum Genet. 2013 Apr 4;92(4):605-13. doi: 10.1016/j.ajhg.2013.02.013. Epub 2013 Mar 28. Am J Hum Genet. 2013. PMID: 23541340 Free PMC article.
-
ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information.BMC Bioinformatics. 2011 Dec 14;12 Suppl 14(Suppl 14):S7. doi: 10.1186/1471-2105-12-S14-S7. BMC Bioinformatics. 2011. PMID: 22373054 Free PMC article.
References
-
- Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000;18:630–634. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous