Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 3;8(1):1293.
doi: 10.1038/s41467-017-01389-4.

Dense and accurate whole-chromosome haplotyping of individual genomes

Affiliations

Dense and accurate whole-chromosome haplotyping of individual genomes

David Porubsky et al. Nat Commun. .

Abstract

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Fig. 1
Fig. 1
Phasing efficacy of read-based and experimental phasing approaches using chromosome 1 as an example. a Two homologous chromosomes are shown (blue and black). Experimental phasing approaches like Strand-seq can connect heterozygous alleles along whole chromosomes, however, at higher costs (time and labor) and lower density of captured alleles. In contrast, read-based phasing can deliver high-density haplotypes, but only short haplotype segments are assembled with an unknown phase between them. b Barplot showing the percentage of phased variants, for each sequencing technology, from the total number of reference SNVs (Illumina platinum haplotypes). c Graphical summary of phased haplotype segments for Illumina, PacBio, 10X Genomics, and Strand-seq phasing shown for chromosome 1. Each haplotype segment is colored in a different color with the longest haplotype colored in red. Side bar graph reports the percentage of SNVs phased in the longest haplotype segment. d Accuracy of each independent phasing approach measured as the percentage of switch errors in comparison to benchmark haplotypes
Fig. 2
Fig. 2
Integration of global and local haplotypes by the WhatsHap algorithm. An example solution of the weighted minimal error correction problem (wMEC) using WhatsHap algorithm is shown. For simplicity base qualities used as weights are omitted from the picture (for details on wMEC see Patterson et al. 2015). a The columns of the matrix represent 34 heterozygous variants (SNVs). Continuous stretches of zeros and ones indicate alleles supported by respective reads (0—reference allele, 1—alternative allele). The first two rows of the wMEC matrix are represented by Strand-seq haplotypes, illustrated as one “super read” connecting alleles along the whole length of the chromosome. (First row haplotype 1 alleles, second row haplotype 2 alleles). Subsequent rows of the matrix are represented by reads that map to the reference assembly in short overlapping segments. Sequencing errors (shown in red in read 2 and 7) are corrected when the cost for flipping the alleles is minimized. b Reads are then partitioned into two haplotype groups (Haplotpye 1—dark blue, Haplotype 2—light blue) such that a minimal number of alleles is corrected (in red). As an illustration of long haplotype contiguity facilitated by Strand-seq “super reads,” we depict two non-overlapping groups of reads (gray rectangles) that can be stitched together by Strand-seq (dashed lines). c Final haplotypes are exported for both groups of optimally partitioned reads
Fig. 3
Fig. 3
Various combinations of Strand-seq and read-based phasing using chromosome 1 as an example. Plots show haplotype quality measures for various combinations of Strand-seq cells (5, 10, 20, 40, 60, 80, 100, 120, 134) with selected coverage depths of Illumina or PacBio sequencing data (2, 3, 4, 5, 10, 15, 25, 30, >30-fold), or in combination with 10X Genomics haplotypes. a Assessment of the completeness of the largest haplotype segment as the % of phased SNVs. b Assessment of the contiguity of the largest haplotype segment as the length of the largest haplotype segment. Every phased haplotype segment is depicted as a different color, with the largest segment colored in red. Black asterisks point to a recommended depth of coverage of a given technology in combination with Strand-seq. c Assessment of the accuracy of the largest haplotype segment as the level of agreement with the “reference” standard. Black arrowheads highlight PacBio sequencing depth where accuracy of final haplotypes does not substantially improve.
Fig. 4
Fig. 4
Recommended settings to phase certain amounts of individuals. a Genome-wide phasing of NA12878 using combination of 40 Strand-seq libraries with 30× short Illumina reads, 10 Strand-seq libraries with 10-fold long PacBio reads, or 10 Strand-seq libraries with 10X Genomics data. Plots show quality measures such as percentage of phased SNV pairs, switch error rate, and Hamming error rate for phased autosomal chromosomes. b A diagram providing the recommendations for the required number of Strand-seq libraries to be combined with recommended minimum of 10-fold PacBio and 30× Illumina coverage in order to reach global and accurate haplotypes for a depicted number of individual diploid genomes

Similar articles

  • Semi-automated assembly of high-quality diploid human reference genomes.
    Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger MR, Porubsky D, Cheng H, Asri M, Logsdon GA, Carnevali P, Chaisson MJP, Chin CS, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton RS, Fulton LL, Garg S, Gerton JL, Ghurye J, Granat A, Green RE, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger EB, Jain M, Kirsche M, Kolmogorov M, Korbel JO, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell MW, McDaniel J, Nie F, Olsen HE, Olson ND, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg SL, Sanders AD, Schatz MC, Schmitt A, Schneider VA, Selvaraj S, Shafin K, Shumate A, Stitziel NO, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin AV, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook JM, Eichler EE, Phillippy AM, Paten B, Howe K, Miga KH; Human Pangenome Reference Consortium. Jarvis ED, et al. Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19. Nature. 2022. PMID: 36261518 Free PMC article.
  • Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.
    Bansal V. Bansal V. Bioinformatics. 2019 Jul 15;35(14):i242-i248. doi: 10.1093/bioinformatics/btz329. Bioinformatics. 2019. PMID: 31510646 Free PMC article.
  • Chromosome-Length Haplotypes with StrandPhaseR and Strand-seq.
    Hanlon VCT, Porubsky D, Lansdorp PM. Hanlon VCT, et al. Methods Mol Biol. 2023;2590:183-200. doi: 10.1007/978-1-0716-2819-5_12. Methods Mol Biol. 2023. PMID: 36335500
  • Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.
    Hu Y, Yang C, Zhang L, Zhou X. Hu Y, et al. Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11. Methods Mol Biol. 2023. PMID: 36335499 Review.
  • Computational methods for chromosome-scale haplotype reconstruction.
    Garg S. Garg S. Genome Biol. 2021 Apr 12;22(1):101. doi: 10.1186/s13059-021-02328-9. Genome Biol. 2021. PMID: 33845884 Free PMC article. Review.

Cited by

  • Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches.
    Yi D, Nam JW, Jeong H. Yi D, et al. Brief Bioinform. 2023 Sep 20;24(5):bbad297. doi: 10.1093/bib/bbad297. Brief Bioinform. 2023. PMID: 37587831 Free PMC article. Review.
  • Dominance vs epistasis: the biophysical origins and plasticity of genetic interactions within and between alleles.
    Xie X, Sun X, Wang Y, Lehner B, Li X. Xie X, et al. Nat Commun. 2023 Sep 9;14(1):5551. doi: 10.1038/s41467-023-41188-8. Nat Commun. 2023. PMID: 37689712 Free PMC article.
  • Variant calling: Considerations, practices, and developments.
    Zverinova S, Guryev V. Zverinova S, et al. Hum Mutat. 2022 Aug;43(8):976-985. doi: 10.1002/humu.24311. Epub 2021 Dec 16. Hum Mutat. 2022. PMID: 34882898 Free PMC article. Review.
  • Multi-platform discovery of haplotype-resolved structural variation in human genomes.
    Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D, Rausch T, Gardner EJ, Rodriguez OL, Guo L, Collins RL, Fan X, Wen J, Handsaker RE, Fairley S, Kronenberg ZN, Kong X, Hormozdiari F, Lee D, Wenger AM, Hastie AR, Antaki D, Anantharaman T, Audano PA, Brand H, Cantsilieris S, Cao H, Cerveira E, Chen C, Chen X, Chin CS, Chong Z, Chuang NT, Lambert CC, Church DM, Clarke L, Farrell A, Flores J, Galeev T, Gorkin DU, Gujral M, Guryev V, Heaton WH, Korlach J, Kumar S, Kwon JY, Lam ET, Lee JE, Lee J, Lee WP, Lee SP, Li S, Marks P, Viaud-Martinez K, Meiers S, Munson KM, Navarro FCP, Nelson BJ, Nodzak C, Noor A, Kyriazopoulou-Panagiotopoulou S, Pang AWC, Qiu Y, Rosanio G, Ryan M, Stütz A, Spierings DCJ, Ward A, Welch AE, Xiao M, Xu W, Zhang C, Zhu Q, Zheng-Bradley X, Lowy E, Yakneen S, McCarroll S, Jun G, Ding L, Koh CL, Ren B, Flicek P, Chen K, Gerstein MB, Kwok PY, Lansdorp PM, Marth GT, Sebat J, Shi X, Bashir A, Ye K, Devine SE, Talkowski ME, Mills RE, Marschall T, Korbel JO, Eichler EE, Lee C. Chaisson MJP, et al. Nat Commun. 2019 Apr 16;10(1):1784. doi: 10.1038/s41467-018-08148-z. Nat Commun. 2019. PMID: 30992455 Free PMC article.
  • Minimum error correction-based haplotype assembly: Considerations for long read data.
    Majidian S, Kahaei MH, de Ridder D. Majidian S, et al. PLoS One. 2020 Jun 12;15(6):e0234470. doi: 10.1371/journal.pone.0234470. eCollection 2020. PLoS One. 2020. PMID: 32530974 Free PMC article.

References

    1. Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. The importance of phase information for human genomics. Nat. Rev. Genet. 2011;12:215–223. doi: 10.1038/nrg2950. - DOI - PMC - PubMed
    1. Wang J, Fan HC, Behr B, Quake SR. Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell. 2012;150:402–412. doi: 10.1016/j.cell.2012.06.030. - DOI - PMC - PubMed
    1. Fan HC, Wang J, Potanina A, Quake SR. Whole-genome molecular haplotyping of single cells. Nat. Biotechnol. 2011;29:51–57. doi: 10.1038/nbt.1739. - DOI - PMC - PubMed
    1. Glusman G, Cox HC, Roach JC. Whole-genome haplotyping approaches and genomic medicine. Genome Med. 2014;6:73. doi: 10.1186/s13073-014-0073-7. - DOI - PMC - PubMed
    1. Leung D, et al. Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature. 2015;518:350–354. doi: 10.1038/nature14217. - DOI - PMC - PubMed

Publication types