Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;33(3):285-289.
doi: 10.1038/nbt.3129. Epub 2015 Jan 19.

Integrated genome and transcriptome sequencing of the same cell

Affiliations

Integrated genome and transcriptome sequencing of the same cell

Siddharth S Dey et al. Nat Biotechnol. 2015 Mar.

Abstract

Single-cell genomics and single-cell transcriptomics have emerged as powerful tools to study the biology of single cells at a genome-wide scale. However, a major challenge is to sequence both genomic DNA and mRNA from the same cell, which would allow direct comparison of genomic variation and transcriptome heterogeneity. We describe a quasilinear amplification strategy to quantify genomic DNA and mRNA from the same cell without physically separating the nucleic acids before amplification. We show that the efficiency of our integrated approach is similar to existing methods for single-cell sequencing of either genomic DNA or mRNA. Further, we find that genes with high cell-to-cell variability in transcript numbers generally have lower genomic copy numbers, and vice versa, suggesting that copy number variations may drive variability in gene expression among individual cells. Applications of our integrated sequencing approach could range from gaining insights into cancer evolution and heterogeneity to understanding the transcriptional consequences of copy number variations in healthy and diseased tissues.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of DR-Seq for sequencing gDNA and mRNA from the same cell. (a) gDNA and mRNA/cDNA are shown in red and green, respectively. Following single-cell lysis and RT using adapter Ad-1x (purple), gDNA and single stranded cDNA are amplified by Ad-2 (blue) using a quasilinear amplification strategy. The majority of the short amplicons contain Ad-2 at both ends and cDNA-derived amplicons contain Ad-2 at one end and Ad-1x at the other end. The sample is then split into two halves and processed separately to amplify and sequence gDNA or cDNA. (b) Distribution of reads within 100 nucleotides of the gene Dppa5a for two single cells (red and black) as a function of the random priming location by adapter Ad-2. The unique length-based identifiers found in the two cells can be used to count the original number of cDNA molecules within each cell and minimize amplification biases. The figure shows that distinct positions are randomly primed within each cell with high affinity binding sites being preferentially primed. The size of the dots indicate the binding propensity of each location. For most genes such as Dppa5a, the number of theoretical binding sites far exceed the number of length-based identifiers detected, thereby enabling length-based identifiers to accurately estimate the original number of cDNA molecules.
Figure 2
Figure 2
Development of a computational techniques to reduce technical noise in single-cell DR-Seq sequencing data and comparison of DR-Seq to existing single-cell gDNA or mRNA sequencing methods in the mouse embryonic stem cell line E14. (a) Comparison of the coefficient of variation showed that cell-to-cell variability in the expression of genes reduced after correcting the raw read-based data using length-based identifiers, implying reduction in technical noise in the single-cell transcriptome data of DR-Seq (also see Supplementary Fig. 6). (b) Coefficient of variation versus mean expression of genes for the read-based data. Because each cell contains the same number of spike-in molecules, they are expected to display the lowest noise for a given mean level of expression. The data shows that read-based data contains significant amount of technical noise that obscures biological variability between single cells. (c) After correcting the DR-Seq data using length-based identifiers, spike-in molecules typically display the least noise over the entire range of mean expressions (also see Supplementary Fig. 7). Endogenous genes and spike-in molecules are indicated using gray and red dots, respectively. (d) Comparison of mRNA sequencing results between DR-Seq and CEL-Seq showed that both methods show similar performance in detecting genes above different expression thresholds obtained from bulk mRNA sequencing data. (Inset) Overall, both methods detect similar number of genes (also see Supplementary Fig. 10). (e) Detection of ERCC spike-in molecules in both methods increased monotonically with the expected number of molecules per cell. The figure shows spike-ins that were found in at least 2 single cells. (f) Box plot comparing bin-to-bin variability in gDNA read counts using two different methods for 3 single cells amplified by DR-Seq. The coverage-based method displays approximately two-fold reduction in technical noise compared to the read-based method. The box plots show the coefficient of variation of read distribution over all the autosomes in the mouse genome. (g) Lorenz plots were used to compare single cell gDNA sequencing results between DR-Seq and MALBAC. Lorenz curves were used to assess the uniformity of genome coverage by plotting the cumulative increase in read depth verses the cumulative fraction of genome covered, ordered by increasing coverage. The green line indicates the theoretical limit with reads distributed uniformly across the whole genome. Based on the Lorenz plots, bulk sequencing achieves read distribution close to the theoretical limit. The 6 single cells processed with either DR-Seq or MALBAC display similar distribution of reads across the genome. (h) Power spectrum of read distribution over different genomic length scales are shown for bulk sequencing and single cells processed by DR-Seq and MALBAC. The power spectrum reveals biases in read depth distribution over different ranges of genomic length scales. Bulk sequencing shows the least bias in read distribution with both DR-Seq and MALBAC performing similarly. (i) Read distribution for regions of the genome with different GC content shows that both methods deviate from the expected normalized count of 1 for regions with high and low GC content. This GC bias is corrected prior to estimating copy numbers in single cells.
Figure 3
Figure 3
Applying DR-Seq to the SK-BR-3 cell line to understand how copy number variations affect gene expression in single cells. (a) Top panel shows raw gDNA data (dots) and different copy numbers (red line) identified using the CBS algorithm for Chr 8 in bulk sequencing data. The middle panel shows raw data (dots) and median read counts (red line) identified using CBS for one single cell (SC13). Visual comparison of the top and middle panels show that most breakpoints are reliably detected in single cells and patterns of level changes between bulk and single cell gDNA sequencing are well correlated. The median read depths for each segment in single cells and the bulk copy numbers are used to estimate copy number variations in single cells (Supplementary Note). For each median level identified from the single cell gDNA data (middle panel), mean expression of genes within each level was calculated (black lines in lower panel). The lower panel shows that the mean expression of genes within each segment correlates well with the median gDNA levels. (b) Genome-wide quantification of mean expression of genes within different copy number regions shows a monotonic increase in average expression with increase in copy number for 3 single cells (also see Supplementary Fig. 25). (c) For a large range of mean expressions (5-400 RPM), genes exhibiting the highest and lowest noise (quantified as coefficient of variation, or CV) were identified. The x-axis shows the percentage of most noisy and least noisy genes that were considered in the analysis. The data shows that the noisiest genes are associated with low copy number regions and vice versa (also see Supplementary Fig. 27). Error bars represent standard error in estimating the mean obtained by bootstrapping the data.

Similar articles

Cited by

References

    1. Stranger BE, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–853. - PMC - PubMed
    1. Conrad DF, et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–712. - PMC - PubMed
    1. Keane TM, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477:289–294. - PMC - PubMed
    1. Sheltzer JM, Torres EM, Dunham MJ, Amon A. Transcriptional consequences of aneuploidy. Proc. Natl. Acad. Sci. U.S.A. 2012;109:12644–12649. - PMC - PubMed
    1. Raj A, van Oudenaarden A. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. 2008;135:216–226. - PMC - PubMed

Publication types

Associated data