Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 16:6:6822.
doi: 10.1038/ncomms7822.

Calibrating genomic and allelic coverage bias in single-cell sequencing

Affiliations

Calibrating genomic and allelic coverage bias in single-cell sequencing

Cheng-Zhong Zhang et al. Nat Commun. .

Abstract

Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Non-uniformity in genome coverage and its impact on the sequencing yield
(a) Dependence of the information yield on the sequencing depth. Deeper sequencing of bulk libraries yields information on a larger population of cells; deeper sequencing of whole-genome amplified single-cell libraries reveals information on a larger fraction of the genome (thick lines). (b) Genome coverage bias at different levels. ‘Amplification bias’ (top): whole-genome amplification generates coverage bias at the amplicon level, which is ~10–50 kb for multi-strand displacement amplification. ‘Sequencing bias’ (bottom): non-uniformity in the selection of sequencing fragments can be caused by multiple sources of bias including whole-genome amplification: the variation in sequencing coverage can be observed from 100 bp to multiple megabases. (c) Schematic representations of recurrent and random amplification bias from multiple independent amplifications of the same DNA material.
Figure 2
Figure 2. Statistical analysis of whole-genome amplification bias and coverage uniformity
(a) Autocorrelation in the genome coverage of a two-cell RPE-1 DNA library (RPE#1) amplified by multi-strand displacement amplification (MDA). The same library independently sequenced to 0.1× (open triangles) and to 8× (solid triangles) and exhibits a correlation above 1 kb that is invariant at intermediate depths (shaded triangles) from downsampling of the 9× sequencing data. Black-dashed curve represents exponential fitting of the autocorrelation in the 1–100 kb range as 2+0.17e−Δ/lc with a correlation length lc = 33 kb (95% confidence interval: 27–42 kb). This correlation is absent in the bulk library sequenced to different depths. Both the bulk and the MDA-generated libraries show a sequencing-fragment-level correlation (lc=100 bp) that decays with the sequencing depth. (b) The identical normalized cumulative coverage at bin size 1/2×lc evaluated from the 9× (solid) and from the 0.1× sequencing (dashed) reflects the same amplicon-level variation due to MDA. The agreement between bin-level (dashed and solid lines) and base-level (red dots) depth-of-coverage curves further suggests that the bin-level variation contributes the dominant amplification bias. See Supplementary Figs 2 and 4–8 for more examples of the correlation (a) and coverage (b) analysis of single-cell sequencing data from different studies. (c) Relationship between genome coverage (% covered at 1× mean sequencing depth) and amplification bias (measured by the amplitude of the amplicon-level correlation) of single-cell libraries from different studies. Coverage is evaluated at Chr. 1 for both haploid sperms and diploid cells, as well as the SW480 tumour cells (disomic in Chr. 1), and at Chr. 10 (monosomic), Chr. 12 (disomic) and Chr. 13 (disomic) for glioblastoma nuclei. The inverse dependence is fitted with an empirical formula, y=0.86(1.2+x)(R2=0.98). (d) Comparison of the cumulative coverage in the most uniform single-cell library from each study. Data were directly evaluated from high-depth sequencing of all samples except the neuron library for which the curve was interpolated from 0.5× sequencing as in b.
Figure 3
Figure 3. Amplification bias of homologous chromosomes
(a) Schematic illustration of the ‘mixed template model’ and the ‘segregated template model’ reflecting different allele-level contributions to the same locus-level coverage. (Methods, Supplementary Fig. 10). (b) Comparison of the allele coverage predictions (‘Pre.’) from 1× sequencing depth with the observed coverage at heterozygous sites (‘Obs.’) at 9× sequencing depth in three single glioblastoma libraries. The combined coverage of reference and alternate bases (red dots) at 9× sequencing validates the prediction from 1× sequencing (dashed curve). The allele coverage (reference or alternate) is then predicted from the combined coverage assuming mixed templates (MTM, blue dotted lines) or segregated templates (STM, green dotted lines) and compared with the coverage of reference (blue triangles) or alternate (green triangles) bases at heterozygous sites. The predictions were made from the sequence coverage in disomic Chr. 12 but the agreement with observations in different disomic chromosomes demonstrate that amplification bias is consistent in all chromosomes.
Figure 4
Figure 4. Variant detection in single-cell genomes
(a) Census-based variant calling requires that acceptable variants be observed in at least two independent single-cell libraries. (b) Estimates of the census-based detection sensitivity for a population of independently amplified single-cell libraries all assumed to have similar amplification bias as GBM #4 (Supplementary Fig. 11). Optimal detection sensitivity is achieved at roughly 0.5× depth-per-library regardless of the sub-clonal fraction or the total sequencing depth. (c) Optimal depth-per-library for census-based variant detection in a population of independently amplified single-cell libraries assumed to have similar coverage bias. The range of the optimal depths is calculated on the basis of the amplification bias observed in single glioblastoma libraries in Fig. 2b. For libraries with more bias or for the detection of variants with lower clonal fractions, it is optimal to sequence more libraries at modest depths (0.1–0.5×). (d) Observed coverage of reference and alternate bases at heterozygous SNP sites in disomic Chr. 5 as an estimate of the census-based detection sensitivity for clonal variants. A varying number of single glioblastoma nuclei (59, 22 and 2) were sequenced to the same total depth (20×) and genotyped at germline heterozygous SNP sites. Group (A) included two cells with the best uniformity and group (B) included two cells with average uniformity. For either heterozygous coverage or the detection of alternate bases, the larger pools offer better sensitivity than the two groups of two cells. (e) Comparison between somatic non-synonymous variants detected in different-sized pools of single cells sequenced to the same total depths (20×). The truth set (48 variants in total) included 43 variants that were detected in both 30× whole-genome and 120× whole-exome sequencing of bulk tumour DNA, plus five additional variants detected in bulk whole-genome and single-cell sequencing. At the same overall sequencing depth, census-based detection from a population of cells (59 and 22) offers higher sensitivity and better specificity over deep sequencing of two libraries. A larger number of private/false positive mutations are observed when individual samples are sequenced to higher depths, and these private calls often arise from sporadic sequencing errors that coincide with amplification errors.

Similar articles

Cited by

References

    1. Kalisky T, Blainey P, Quake SR. Genomic analysis at the single-cell level. Annu. Rev. Genet. 2011;45:431–445. - PMC - PubMed
    1. Shapiro E, Biezuner T, Linnarsson S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 2013;14:618–630. - PubMed
    1. Chi KR. Singled out for sequencing. Nat. Methods. 2014;11:13–17. - PubMed
    1. Navin N, et al. Tumour evolution inferred by single-cell sequencing. Nature. 2011;472:90–94. - PMC - PubMed
    1. Hou Y, et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. Cell. 2012;148:873–885. - PubMed

Publication types