Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 23;523(7561):486-90.
doi: 10.1038/nature14590. Epub 2015 Jun 17.

Single-cell chromatin accessibility reveals principles of regulatory variation

Affiliations

Single-cell chromatin accessibility reveals principles of regulatory variation

Jason D Buenrostro et al. Nature. .

Abstract

Cell-to-cell variation is a universal feature of life that affects a wide range of biological phenomena, from developmental plasticity to tumour heterogeneity. Although recent advances have improved our ability to document cellular phenotypic variation, the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of mammalian DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells by assay for transposase-accessible chromatin using sequencing (ATAC-seq) integrated into a programmable microfluidics platform. Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type-specific accessibility variance across eight cell types. Targeted perturbations of cell cycle or transcription factor signalling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization. Single-cell analysis of DNA accessibility provides new insight into cellular variation of the 'regulome'.

PubMed Disclaimer

Conflict of interest statement

Competing Interest.

Stanford University has filed a provisional patent application on the methods described, and J.D.B., H.Y.C., and W.J.G. are named as inventors. D.R. and M.L.G. declare competing financial interests as employees of Fluidigm Corp.

Figures

Extended Data Figure 1
Extended Data Figure 1. Methods development for assaying single epigenomes
(a) scATAC-seq workflow for steps performed both on and off Fluidigm’s integrated fluidics circuit (IFC). (b–c) The development of an efficient Tn5 release protocol designed to permit downstream enzymatic reactions without DNA purification. (b) An in vitro electrophoretic mobility gel shift assay using a fluorescently labeled PCR product (lane 1), showing a stable Tn5-DNA complex (lane 2) dissociated with 50 mM EDTA (lane 3) or 0.1% SDS (lane 4). (c) Workflow and associated table of conditions used to optimize release protocol, showing conditions that markedly improve fragment yield over no release conditions or purifying DNA (Qiagen MinElute). Fragments released represents the fold gain in library diversity, as measured by quantitative PCR (qPCR). (d) qPCR fluorescence traces of 96 libraries generated using scATAC-seq. For all subsequent libraries we used a total of 14 PCR cycles (dotted line). (e,f) A bar plot of per-cell library (e) sequencing depth and (f) fraction of duplicate reads, showing each library was sequenced to varying depths to a similar fraction of duplicate reads.
Extended Data Figure 2
Extended Data Figure 2. scATAC-seq data recapitulate bulk ATAC-seq characteristics
(a) Reads observed in open chromatin peaks identified from aggregate scATAC-seq data (N = 384 libraries) are highly correlated with reads observed from bulk ATAC-seq. (b) Histogram of aggregated read starts around all TSSs (in K562 cells) comparing ensemble approaches, including 500 cell ATAC-seq reported in a previous publication, to scATAC-seq shows high enrichment above background level of reads. (c) DNA fragment size distribution of ATAC-seq fragments from single cells (grey) and the average of all single cells (red) display characteristic nucleosome-associated periodicity. (d) Phase-contrast (left) and epifluorescence images (right) of captured cell #4 displaying characteristic live cell stain (Calcein) and exclusion of EtBr. (e) Histogram of read starts around TSSs for cell #4 shows high enrichment. (f) DNA fragment size distribution for cell #4 showing nucleosomal periodicity. (g) Images similar to (d) showing staining of cell #83, suggesting low viability due to EtBr staining. (h) Histogram of read starts around TSSs shows lower enrichment than cell #4. (i) DNA fragment size distribution for cell #83. (j) Images similar to (d) showing staining of cell #33 suggesting viability. (k) Histogram of read starts around TSSs of this cell shows low levels of enrichment. (l) DNA fragment size distribution showing no nucleosome-associated periodicity.
Extended Data Figure 3
Extended Data Figure 3. Fragment recovery metrics within scATAC-seq libraries
(a) Accessibility across all peaks (n=50,000) in GM12878 cells. (b) Accessibility across all annotated promoters in GM12878 cells. Typical promoters used for subsequent analysis are boxed with dotted lines. Recovery of typical promoters shown in (a) within single-cells within (c) observed data and (d) extrapolated data using measures of predicted library complexity.
Extended Data Figure 4
Extended Data Figure 4. scATAC-seq data analysis pipeline and validation of bias normalization
Standard deviation of log fold change in reads across cells within peaks binned by deciles of (a) peak intensity, (b) Tn5 bias and (c) GC bias. Variability scores (incorporating bias normalization) within the same peaks shown in (a–c), peaks are binned by deciles of (d) peak intensity, (e) Tn5 bias and (f) GC bias. Log fold change versus deviation scores across single K562 cells for (g) GATA1 ChIP-seq target sites and (h) peaks containing a Nanog motif. Variability scores for factors (purple) and the permuted background (grey) ranked by (i) number of peak associations and (j) the mean accessibility per annotated peak. K562 single-cell data sets showing the effect on variability scores as a function of downsampling fragments. Fidelity after downsampling is measured with (k) correlation and (l) dynamic range relative to the complete data set.
Extended Data Figure 5
Extended Data Figure 5. Biological replicates and measurement error analysis
(a–c) Observed changes in variability comparing the merged set of replicates (K562) to each individual biological replicate. Error bars represent 1 standard deviation of the variability scores after bootstrapping cells from each replicate. (d–f) Correlation of errors computed using three distinct approaches.
Extended Data Figure 6
Extended Data Figure 6. Characterization of high-variance trans-factors in K562 cells
(a–d) Distribution of (a) GATA1, (b) GATA2, (c) actin and (d) CTCF fluorescence observed by flow cytometry. Distributions in grey depict isotype controls. (e) Bi-clustered heat map of single cell deviations as observed within K562 cells (N=239). Labels on right identify co-clustering of related factors. (f) Bi-clustered heat map of single-cell deviations observed from permuted data. (g) Projection of factor loadings onto principal component 1 versus 5 from principal component (PC) analysis of heatmap from Fig. 2d. Factor loadings do not vary along PC5, while peaks associated with regions with different replication timings (RepliSeq) have strong variation along this axis. Venn-diagrams showing variability of (h) GATA1 and/or GATA2, (i) CJUN and/or GATA2 and CEBPB and/or GATA2 (co-) occurring ChIP-seq sites. (j) -log10(p-values) of calculated changes in co-occurring ChIP-seq sites shown in Figure 2e. (k) Distribution of accessibility among GATA1 only, GATA2 only, and shared sites. (l) Mean accessibility from GATA1 only, GATA2 only, and shared sites in (k), error bars represent 1 standard deviation generated by bootstrapping ChIP-seq peaks.
Extended Data Figure 7
Extended Data Figure 7. Drug treatments modulate factor variability
(a–b) Change in variability of untreated K562 cells versus cells treated with (a) Imatinib and (b) JUN inhibitor show increase of variability in factors associated with the cell cycle or s-phase and JUN factors respectively. (c–f) Flow cytometry data depicting DNA content, using DAPI or PI, in (c) control K562 cells or cells showing altered cell-cycle status after treatment with (d) cell-cycle inhibitor, (e) Imatinib and (f) JUN inhibitor.
Extended Data Figure 8
Extended Data Figure 8. TF motif correlation and variability across chromatin state
(a) Hierarchical bi-clustering of high-variance TF motif annotations using Pearson correlation. Variability of regions associated with (b) chromatin states, as identified by Ernst et al., and (c) histone modifications.
Extended Data Figure 9
Extended Data Figure 9. Cis variability analysis within single-cells
(a) Interchromosomal chromosome 1 co-correlations of deviation scores within single cells calculated for bins of 25 peaks within GM12878 cells. (b) Distribution, using density estimation, of correlation values shown in (a). (c–g) Analysis of cis-correlation (identical to Fig. 4) for representative chromosomes 7, 11, 12, 17, and 20. Correlation between scATAC-seq cis-correlation and chromosome conformation capture methods for each chromosome in (h) GM12878 and (i) K562 cells.
Extended Data Figure 10
Extended Data Figure 10. Measurements of individual peaks within single-cells
(a) The distribution of GATA1 deviation scores for single K562 cells. Volcano plots of (b) non-GATA1 peaks and (c) GATA1 peaks in K562 cells, p-values were calculated using a binomial test. (d) The distribution of NF-κB deviation scores for single GM12878 cells. Volcano plots of (e) non-NFKB peaks and (f) NF-κB peaks in GM12878 cells, p-values were calculated using a binomial test. Inset numbers show the number of points in upper left or upper right quadrants of the panel. (g) Accessibility at a genomic locus, showing (top) aggregate NFKB low (blue) and NFKB high (red) profiles, (middle) single GM12878 cells ranked by NFKB deviations scores and (bottom) unranked single-cells.
Figure 1
Figure 1. Single-cell ATAC-seq provides an accurate measure of chromatin accessibility genome-wide
(a) Workflow for measuring single epigenomes using scATAC-seq on a microfluidic device (Fluidigm). (b) Aggregate single-cell accessibility profiles closely recapitulate profiles of DNase-seq and ATAC-seq. (C) Genome-wide accessibility patterns observed by scATAC-seq are correlated with DNase-seq data (R = 0.80). (d) Library size versus percentage of fragments in open chromatin peaks (filtered as described in methods) within K562 cells (N=288). Dotted lines (15% and 10,000) represent cutoffs used for downstream analysis.
Figure 2
Figure 2. Trans-factors are associated with single-cell epigenomic variability
(a) Schematic showing two cellular states (TF high and TF low) leading to differential chromatin accessibility. (b) Analysis infrastructure, which uses a calculated background signal (BS; see Supplemental Methods section 3.2) to calculate TF deviations and variability from scATAC-seq data. The TF value is calculated by subtracting the number of expected fragments from the observed fragments per cell (see Supplemental Methods section 3.1). (c) Observed cell-to-cell variability within sets of genomic features associated with ChIP-seq peaks, transcription factor motifs, and replication timing (error estimates shown in grey, see Methods for details). Variability measured from permuted background (see Methods) is shown in grey dots. (d) Distribution of normalized deviations from expected accessibility signal for GATA1 sites in individual cells, histogram of cells shown in grey, density profile shown in purple (see Methods). (e) Immunostaining of GATA1 (green) and GATA2 (red) shows protein expression in K562s. (f) Principal components ranked by fraction of variance explained from observed data (purple) and permuted data (orange). Bar plot of observed data shown in grey. (g) Calculated changes in associated variability of factors when present together versus independently, depicting a context-specific trans-factor variability landscape (see Methods). Venn-diagrams show variability associated with GATA1 and/or GATA2 and CTCF and/or SMC3 (co-) occurring ChIP-seq sites.
Figure 3
Figure 3. Cell type specific epigenomic variability
Change of cellular variability due to chemical perturbations using (a) CDK4/6 cell-cycle inhibitor (K562) or (b) TNF-alpha stimulation (GM12878), error bars (shown in grey) represent 1 standard deviation of bootstrapped cells across the two conditions. (c) Heat map of deviations from expected accessibility signal across trans-factors (rows) and of single cells (columns) from 3 cell types. Bottom color map represents assignment classification from hierarchical clustering. (d) Variability associated with trans-factor motifs across 7 cell types. Each row is normalized to the maximum variability for that motif across cell types (shown left).
Figure 4
Figure 4. Structured cis variability across single epigenomes
(a) Per-cell deviations of expected fragments across a region within chromosome 1 (see Methods). For display, only large deviation cells are shown (N=186 cells). (b) Pearson correlation coefficient representing topological domain signal (see Methods) of interaction frequency from a chromatin conformation capture assay (left, data from Kalhor et al.) or doubly correlated normalized deviations of scATAC-seq (right) from chromosome 1 (see Methods). Data in white represents masked regions due to highly repetitive regions. (c) Permuted cis-correlation map for chromosome 1 (analyzed identically to (b)). (d) Box highlights a representative region depicting long-range covariability.

Comment in

Similar articles

Cited by

References

    1. Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453:544–547. - PMC - PubMed
    1. Imayoshi I, et al. Oscillatory control of factors determining multipotency and fate in mouse neural progenitors. Science. 2013;342:1203–1208. - PubMed
    1. Patel AP, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–1401. - PMC - PubMed
    1. Bendall SC, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011;332:687–696. - PMC - PubMed
    1. Raj A, Rifkin SA, Andersen E, van Oudenaarden A. Variability in gene expression underlies incomplete penetrance. Nature. 2010;463:913–918. - PMC - PubMed

Publication types

Associated data