Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 22:6:8687.
doi: 10.1038/ncomms9687.

Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression

Affiliations

Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression

Jong Kyoung Kim et al. Nat Commun. .

Erratum in

Abstract

Single-cell RNA-sequencing (scRNA-seq) facilitates identification of new cell types and gene regulatory networks as well as dissection of the kinetics of gene expression and patterns of allele-specific expression. However, to facilitate such analyses, separating biological variability from the high level of technical noise that affects scRNA-seq protocols is vital. Here we describe and validate a generative statistical model that accurately quantifies technical noise with the help of external RNA spike-ins. Applying our approach to investigate stochastic allele-specific expression in individual cells, we demonstrate that a large fraction of stochastic allele-specific expression can be explained by technical noise, especially for lowly and moderately expressed genes: we predict that only 17.8% of stochastic allele-specific expression patterns are attributable to biological noise with the remainder due to technical noise.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Schematic representation of the noise decomposition method.
With the help of external RNA spike-in molecules, added at the same quantity to each cell’s lysate, we first estimate four parameters capturing technical variability, which are the expectation and variance of capture (θ) and sequencing (γ) efficiency. Then, by the general variance decomposition formula, the total observed variance of read counts can be decomposed into technical (blue) and biological (green) variance terms. The estimate of biological variance can be obtained by subtracting technical variance terms from the total observed variance. Shot noise (or Poisson noise) is cell-to-cell variability that can be modelled by a Poisson process.
Figure 2
Figure 2. Validation of estimated biological noise of 2i-grown mESCs by single-molecule FISH.
(a) Bar plot depicts the measured coefficient of variation (CV) (y axis) of chosen genes by each method: total noise by scRNA-seq (Cell); Models II and III of Grün et al.; our noise decomposition method (Decomposition); single cell FISH (smFISH). Genes chosen by Grün et al. to cover a dynamic range of gene expression are sorted by their expression levels: lowly expressed genes (Sohlh2, Notch1, Gli2 and Stag3), moderately expressed genes (Tpx2) and highly expressed genes (Pou5f1, Sox2, Pcna2 and Klf4). Notch1 is not available in serum-grown mESCs of Grün et al. Error bars represent standard deviation (s.d.): bootstrap s.d. for our predictions; s.d. derived from estimated standard errors of the parameters of a negative binomial distribution for other methods. (b) Comparison of models for the deviation of the model estimates of CV from the smFISH estimates of CV using z-scores of lowly expressed gene. To compare the accuracy of the model estimates of CV, we performed a one-tailed paired t-test between two paired sets (corresponding to different methods) of z-scores of genes for each group. Here the z-score is defined as |xi-μi|/σi, where xi is the model estimate of CV of gene i, μi is the smFISH estimate of CV of gene i and σi is the standard deviation of the model estimate of CV of gene i. As the lower z-score means more accurate estimate of biological CV in terms of smFISH measurements, we set the alternative hypothesis to state that the z-scores of our method is less than that of other methods. For lowly expressed genes, our method outperforms the deconvolution-based methods (P=0.0166 between model II and ours, P=0.0385 between model III and ours). Error bars represent 95% confidence intervals. (c) Comparison of biological estimates of CV between model III of Grün et al. and our noise decomposition method.
Figure 3
Figure 3. Stochastic monoallelic expression.
(a) Experimental design of allele-specific scRNA-seq. (b) Examples of genes showing stochastic monoallelic, monoallelic and correlated allelic expression. In each scatter plot, black dots represent normalized read counts of maternal (red in an oval) and paternal (blue in an oval) alleles of single cells (depicted as an oval) of a single gene. In contrast to monoallelic expression (the most expressed allele is the same across cells), the most expressed allele in stochastic monoallelic expression is different across cells. (c) Mean fraction of most expressed alleles for SNPs binned by expression levels assuming each allele is expressed at the same level, simulated from the ‘T’ model (solid blue line) or the ‘T+B’ model (solid red line). Error bars denote 95% confidence intervals by bootstrap (100 bootstrap samples). Mouse images reproduced with permission from the Jackson Laboratory.
Figure 4
Figure 4. Testing stochastic allelic-specific expression.
In the upper panel, each dot in the three scatter plots (for SNPs assigned to Trim25, Amacr, or Hspa8) represents the expression value of a single cell, where the x axis represents the normalized read count of the paternal allele and the y axis represents the normalized read count of the maternal allele of a single gene. In the middle panel highlighted by the green box, each scatter plot shows the allelic ratios (y axis) of cells (x axis, cell index from 1 to 96) of a single gene. The horizontal dashed red line represents the cutoff of 0.95 to define monoallelic expression. All three genes display monoallelic expression in at least one cell by this criterion. In the lower panel highlighted by the blue box, we simulated 54 pseudo cells assuming only technical noise and calculated the average allelic ratio across the pseudo cells. We repeated this process 10,000 times and computed the empirical P-value of the observed average allelic ratio (vertical dashed red line) based on a null distribution of simulated ratios (black). Of the three genes, only Trim25 has a higher average allelic ratio than expected by chance, suggesting it displays stochastic allele-specific expression. NS, not significant.
Figure 5
Figure 5. Distinguishing genuine from technical allelic expression patterns.
(a) Mean fraction of SNPs showing stochastic ASE as a function of overall gene expression. Colours indicate different approaches for calling stochastic ASEs. Numbers in parentheses represent the number of genes identified by each approach. (b) Mean fraction of most expressed alleles for SNPs binned by expression levels. The mean expression levels of both alleles were separately estimated. Colours indicate different approaches for computing the mean fraction by scRNA-seq measurements (mESCs), ‘T’ model and ‘T+B’ model. Error bars denote 95% confidence intervals by bootstrap (100 bootstrap samples).

Similar articles

Cited by

References

    1. Yan L. et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013). - PubMed
    1. Tang F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis. Cell Stem Cell 6, 468–478 (2010). - PMC - PubMed
    1. Shalek A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013). - PMC - PubMed
    1. Shalek A. K. et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510, 363–369 (2014). - PMC - PubMed
    1. Marinov G. K. et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 24, 496–510 (2014). - PMC - PubMed

Publication types