Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul;37(7):773-782.
doi: 10.1038/s41587-019-0114-2. Epub 2019 May 6.

Determining cell type abundance and expression from bulk tissues with digital cytometry

Affiliations

Determining cell type abundance and expression from bulk tissues with digital cytometry

Aaron M Newman et al. Nat Biotechnol. 2019 Jul.

Abstract

Single-cell RNA-sequencing has emerged as a powerful technique for characterizing cellular heterogeneity, but it is currently impractical on large sample cohorts and cannot be applied to fixed specimens collected as part of routine clinical care. We previously developed an approach for digital cytometry, called CIBERSORT, that enables estimation of cell type abundances from bulk tissue transcriptomes. We now introduce CIBERSORTx, a machine learning method that extends this framework to infer cell-type-specific gene expression profiles without physical cell isolation. By minimizing platform-specific variation, CIBERSORTx also allows the use of single-cell RNA-sequencing data for large-scale tissue dissection. We evaluated the utility of CIBERSORTx in multiple tumor types, including melanoma, where single-cell reference profiles were used to dissect bulk clinical specimens, revealing cell-type-specific phenotypic states linked to distinct driver mutations and response to immune checkpoint blockade. We anticipate that digital cytometry will augment single-cell profiling efforts, enabling cost-effective, high-throughput tissue characterization without the need for antibodies, disaggregation or viable cells.

PubMed Disclaimer

Conflict of interest statement

Competing Interests

A.M.N. has patent filings related to expression deconvolution and cancer biomarkers and has served as a consultant for Roche, Merck, and CiberMed. A.A.A. has patent filings related to expression deconvolution and cancer biomarkers and has served as a consultant or advisor for Roche, Genentech, Janssen, CiberMed, Pharmacyclics, Gilead, and Celgene. M.D. has patent filings related to cancer biomarkers and has served as a consultant for Roche, Novartis, CiberMed, and Quanticel Pharmaceuticals. No potential conflicts of interest were disclosed by the other authors.

Figures

Figure 1.
Figure 1.
Framework for in silico cell enumeration and purification. A typical CIBERSORTx workflow involves a serial approach, in which molecular profiles of cell subsets are first obtained from a small collection of tissue samples and then repeatedly used to perform systematic analyses of cellular abundance and gene expression signatures from bulk tissue transcriptomes. This process involves: (1) transcriptome profiling of single cells or sorted cell subpopulations to define a “signature matrix” consisting of barcode genes that can discriminate each cell subset of interest in a given tissue type; (2) applying the signature matrix to bulk tissue RNA profiles in order to infer cell type proportions and (3) representative cell type expression signatures; and (4) purifying multiple transcriptomes for each cell type from a cohort of related tissue samples. Using metastatic melanomas as an example, Figure 6 illustrates the application of each step.
Figure 2.
Figure 2.
Bulk tissue deconvolution with single-cell reference profiles. (a) Left: t-SNE projection of scRNA-seq data from the peripheral blood of a patient with NSCLC, with six major leukocyte populations indicated. Right: Heat map of signature matrix genes distinguishing these six subsets. (b) Enumeration of leukocyte frequencies in RNA-seq profiles of whole blood (n = 12 healthy adults) using the PBMC signature matrix from panel a, shown before and after cross-platform normalization. Performance gains are shown as Pearson correlations and assessed for each sample across five cell types quantified by flow cytometry and automated complete blood counts (CBCs): B cells, NK cells, CD8 T cells, CD4 T cells, and monocytes. Statistical significance was determined using a two-sided Wilcoxon signed-rank test. Data are presented as boxplots (n = 12 per group; center line, median; box limits, upper and lower quartiles; whiskers, maximum and minimum values). (c) t-SNE projection of scRNA-seq data from 23 head and neck squamous cell carcinoma (HNSCC) tumors (left) and training/validation approach for assessing single-cell deconvolution performance (right). (d) Concordance between cell type proportions measured by scRNA-seq and CIBERSORTx deconvolution for 20 held-out HNSCC tumors for validation from panel c. All tumor GEPs were reconstructed from single-cell data. (e) Analysis of cell subset enumeration across diverse signature matrices, tissues, and platforms. Deconvolution was run with batch correction (Methods). Ground truth cell proportions were determined by scRNA-seq (HNSCC, melanoma) or by flow cytometry and automated hematology leukocyte differential counter (blood). Cell subsets within the gray band are significantly concordant with ground truth by Pearson correlation (P < 0.05). Signature matrices are provided in Supplementary Table 2, with the exception of LM2215. Data are presented as medians ± interquartile range. Additional details are provided in Supplementary Note 1. (f) Left: t-SNE plot of pancreatic islet subsets from ten human subjects, five of which were profiled by scRNA-seq, bulk RNA-seq, and IHC (also see Supplementary Fig. 3f-i). Right: Scatterplots depicting concordance between the frequencies of four major islet subsets quantitated by IHC versus scRNA-seq (top) and CIBERSORTx deconvolution of bulk RNA-seq (bottom), as determined by linear regression. The significance of the result was assessed by a two-sided t test. Cell subset abbreviations: Mono, monocytes; Macs, macrophages; DCs, dendritic cells; Mast, mast cells; CAFs, cancer-associated fibroblasts; Endo, endothelial cells; Myo, myocytes.
Figure 3.
Figure 3.
Purification of representative cell type-specific transcriptome profiles from a group of specimens. (a) Approach for in silico purification and validation of group-mode cell type-specific GEPs. (b) Scatterplots comparing genome-wide expression profiles of mathematically purified (x-axis) and FACS-purified (y-axis) B cells and CD8 T cells from FL lymph nodes. (c) Scatterplots showing the predicted expression level of each gene in panel b (y-axis) as a function of its uncertainty, as captured by the geometric coefficient of variation (c.v.) (x-axis). (d) Same as b, but after applying an adaptive noise filter based on the transcriptome-wide distribution of c.v. values for each cell type. Concordance in b,d was determined by applying Pearson correlation (r), Spearman correlation (ρ), and linear regression (diagonal line) to genes with detectable expression on both platforms. (e) Analysis of the impact of sample size on group-mode gene expression purification for three FL tumor cell subsets, as assessed by Spearman correlation against corresponding FACS-purified GEPs (“ground truth”) in log2 space. For each sample size, tumors were subsampled without replacement from a larger cohort (n = 302) 10 times, and the results are shown with and without adaptive noise filtration. Data are presented as boxplots (center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers). (f) Heat map comparing imputed and ground truth expression profiles for three FL immune subsets, with LM22 genes as rows and immune cell types as columns. Genes that were not predicted to be expressed or that were removed by adaptive noise filtration are colored navy blue. (g) Same as f, but for immune (CD45+), epithelial/cancer (EpCAM+), and stromal (CD10+ or CD31+) subpopulations imputed from bulk RNA-seq profiles of 26 NSCLC tumors. Ground truth GEPs (y-axis) were obtained from FACS-purified populations.
Figure 4.
Figure 4.
High-resolution purification of cell type-specific expression from synthetic mixtures. (a) Schematic illustrating a hypothetical example of unobserved (left) and observed (right) expression data for a series of four randomly admixed immune subsets, each with one or two sets of ground truth DEGs (indicated by colored block-like patterns). (b) Results of in silico purification applied to the bulk tissue expression matrix in panel a. (c) t-SNE representation of purified expression matrices in b (n = 400 samples), performed separately for each cell type and color-coded by the class corresponding to each block pattern in b (red = leftmost block, blue = rightmost block). (d) Left: Synthetic GEP matrix of four randomly admixed immune subsets, one of which contains DEGs in the shape of a bullseye (monocytes). Right: Results of in silico purification. (e) Analysis of the sensitivity and specificity of DEG recovery in synthetic mixtures (n = 50) across 30 conditions (six CD8 T cell proportions and five DEG fold changes) when sample classes are known. Left: DEGs with a given fold change were added into the reference profile of CD8 T cells, which were spiked at a predetermined proportion into random mixtures of three other immune subsets. A colon cancer cell line GEP was added into each mixture to simulate ~50% unknown content. After in silico purification, known CD8 T cell DEGs were assessed in each purified cell type by ranking genes by fold change with respect to known DEG classes. The area under the curve (AUC) for DEG recovery was calculated for each combination of fold change and cell type spike-in fraction, and shown as a heat map. Data in panels a-b,d were log2 adjusted and median-centered for each gene prior to rendering the heat maps.
Figure 5.
Figure 5.
High-resolution expression profiling of bulk tumor biopsies. (a–c) Analysis of CREBBP mutation-associated expression changes in B cells from follicular lymphoma (FL) tumors. (a) Schema outlining the application of high-resolution purification to identify CREBBP mutation-associated DEGs in follicular lymphoma B cells. (b) Heat map confirming loss of MHC class II expression in CREBBP-mutant FL B cell GEPs inferred by CIBERSORTx. Expression values were median-centered prior to plotting. (c) Analysis of published gene sets associated with lower (left) or higher (right) B cell expression in CREBBP-mutant FL tumors. Scatter plots show the corresponding log2 expression of each gene in digitally sorted B cell GEPs, after median centering and averaging by CREBBP mutation status. Group comparisons in b and c were assessed by a two-sided Wilcoxon signed-rank test. Data are presented as means ± s.d. WT, wildtype (n = 10); MT, mutant (n = 14). (df) High-resolution expression profiling of tumor cell subpopulations from non-small cell lung cancer (NSCLC) tumors. (d) Schema for profiling and validating expression signatures of epithelial and cancer cells (Epith., EpCAM+), immune cells (Imm., CD45+), fibroblasts (Fib., CD10+), and endothelial cells (Endo., CD31+) in 1,022 NSCLC tumor and 110 adjacent normal GEPs from TCGA. (e) Left: tSNE plots showing population-specific transcriptional diversity imputed from 1,132 bulk NSCLC GEPs, color-coded to denote tumor histological subtype (LUAD vs. LUSC) and adjacent normal tissues. Plots were created with perplexity set to 10. Right: PCA plots of 21 GEPs from corresponding FACS-purified populations, color-coded by tumor histological subtype. Histological differences are highlighted by ovals. (f) Heat maps showing DEGs identified in epithelial/cancer, immune, and fibroblast populations identified in NSCLC tumors from TCGA (left, n = 1,022 tumor samples) and RNA-seq profiles of corresponding FACS-purified populations from 21 NSCLC patients with LUAD or LUSC (right; Supplementary Table 1). The same genes are shown in both heat maps and are ordered identically. For clarity, a maximum of 300 over- and under-expressed genes are shown for each cell type (all DEGs are provided in Supplementary Table 4). Median centering was applied to each cell population separately. To quantify DEG concordance between CIBERSORTx and the bulk-sorted validation profiles in panel f, we used a Monte Carlo strategy described in Supplementary Note 1. LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma.
Figure 6.
Figure 6.
Cellular signatures of melanoma driver mutation status and immunotherapy response. (a,b) High-resolution expression profiling of mutation-associated phenotypic states in distinct tumor cell types. (a) Top: tSNE plot showing 8 major tumor subpopulations profiled from 19 melanomas by scRNA-seq. Bottom: schema for characterizing and validating context-dependent expression variation in 342 bulk melanoma tumors from TCGA. (b) Heat maps depicting cell type-specific DEGs identified by CIBERSORTx that are associated with BRAF or NRAS driver mutations in melanoma. Expression values were averaged across TCGA tumor samples (CIBERSORTx, left) and single-cell GEPs (scRNA-seq, right) for clarity. Underlying data are provided in Supplementary Fig. 16 and Supplementary Table 4. Only cell types with DEGs detected by CIBERSORTx and with available single-cell validation data are shown. For clarity, a maximum of 300 over- and under-expressed genes are shown for each cell type. CAFs, cancer associated fibroblasts; MT, mutant; WT, wildtype. (c) Left: Heat map showing the absolute difference in gene expression, expressed as ranks, between a normal CD8 T cell reference profile and the following three melanoma CD8 TIL GEPs: ‘Imputed (FF)’, a group-mode CD8 TIL GEP imputed from 473 bulk RNA-seq profiles of fresh/frozen (FF) tumors generated by TCGA; ‘Imputed (FFPE)’, a group-mode CD8 TIL profile predicted from 42 FFPE tumors; ‘scRNA-seq’, a representative CD8 TIL profile derived from aggregated single-cell data of 19 melanoma patients. The heat map is ordered by the difference in ranks between the “Imputed (FF)” vector and the normal CD8 T cell GEP. All imputed profiles were processed by adaptive noise filtration. Genes predicted in the FF but not FFPE cohorts (owing to noise filtration) are colored gray in the latter. Selected T cell exhaustion genes are indicated. Right: Gene set enrichment analysis (GSEA) of a previously published melanoma CD8 TIL-associated gene set, assessed relative to the ordering of genes in the heat map. (d) Framework for characterizing the clinical relevance of PDCD1+CTLA4+ CD8 T cells in bulk melanoma tumors using single-cell reference profiles. (e) Estimated levels of PDCD1+CTLA4+ CD8 T cells in bulk tumor expression profiles from melanoma patients who received immunotherapy, stratified by clinical response. VA, Van Allen and colleagues (n = 37); N, Nathanson and colleagues (n = 24); C, Chen and colleagues (n = 26). Statistical comparisons were performed using a two-sided Mann-Whitney test. Data are expressed as boxplots (center line, median; box limits, upper and lower quartiles; whiskers, maximum and minimum values). (f) Kaplan-Meier plot showing differences in overall survival between melanoma patients with high and low estimated levels of PDCD1+CTLA4+ CD8 T cells in biopsies preceding immune checkpoint blockade (pre-treatment). Patients were split by the median estimated fractional abundance of PDCD1+CTLA4+ CD8 T cells into high and low groups, and subsequently pooled across two studies with available overall survival data,. Statistical significance was calculated by a two-sided log-rank test. HR, hazard ratio. 95% HR confidence intervals are shown in brackets.

Comment in

  • Expanded CIBERSORTx.
    Rusk N. Rusk N. Nat Methods. 2019 Jul;16(7):577. doi: 10.1038/s41592-019-0486-8. Nat Methods. 2019. PMID: 31249418 No abstract available.

Similar articles

Cited by

References

    1. Wagner A, Regev A & Yosef N Revealing the vectors of cellular identity with single-cell genomics. Nat Biotech 34, 1145–1160 (2016). - PMC - PubMed
    1. Shen-Orr SS & Gaujoux R Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol 25, 571–578 (2013). - PMC - PubMed
    1. Newman AM & Alizadeh AA High-throughput genomic profiling of tumor-infiltrating leukocytes. Curr Opin Immunol 41, 77–84 (2016). - PMC - PubMed
    1. Aran D, Hu Z & Butte AJ xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biology 18, 220 (2017). - PMC - PubMed
    1. Racle J, de Jonge K, Baumgaertner P, Speiser DE & Gfeller D Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife 6, e26476 (2017). - PMC - PubMed

Publication types