Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 13:6:e26476.
doi: 10.7554/eLife.26476.

Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

Affiliations

Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data

Julien Racle et al. Elife. .

Abstract

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).

Keywords: cancer biology; cell fraction predictions; computational biology; gene expression; human; systems biology; tumor immune microenvironment.

PubMed Disclaimer

Conflict of interest statement

No competing interests declared.

Figures

Figure 1.
Figure 1.. Estimating the proportion of immune and cancer cells.
(A) Schematic description of our method. (B) Matrix formulation of our algorithm, including the uncharacterized cell types (red box) with no or very low expression of signature genes (green box). (C) Low dimensionality representation (PCA based on the 1000 most variable genes) of the samples used to build the reference gene expression profiles from circulating immune cells (study 1 [Hoek et al., 2015], study 2 [Linsley et al., 2014], study 3 [Pabst et al., 2016]). (D) Low dimensionality representation (PCA based on the 1000 most variable genes) of the tumor- infiltrating cell gene expression profiles from different patients. Each point corresponds to cell-type average per patient of the single-cell RNA-Seq data of Tirosh et al. (2016) (requiring at least 3 cells of a given cell type per patient). Only samples from primary tumors and non-lymphoid tissue metastases were considered. Projection of the original single-cell RNA-Seq data can be found in Figure 1—figure supplement 1.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Low dimensionality representation of the tumor-infiltrating cell samples.
Principal component analysis of the samples used to build the reference gene expression profiles from tumor-infiltrating immune cells, based on the data from Tirosh et al. (2016), considering only the primary tumor and non-lymphoid tissue metastasis samples.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Cell type mRNA content.
(A) mRNA content per cell type obtained for cell types sorted from blood. Values for B, NK, T cells and monocytes were obtained as described in Materials and methods. Values for Neutrophils are from Subrahmanyam et al. (2001). (B) Width of the forward scatter values for the different immune and cancer cells from flow cytometry data of melanoma metastatic lymph nodes. Data were first normalized by the mean FSC-W for each donor. Error bars represent the standard deviation from data of 4 patients.
Figure 2.
Figure 2.. Predicting cell fractions in blood samples.
(A) Predicted vs. measured immune cell proportions in PBMC (dataset 1 (Zimmermann et al., 2016), dataset 2 (Hoek et al., 2015)) and whole blood (dataset 3 (Linsley et al., 2014)); predictions are based on the reference profiles from circulating immune cells. (B) Performance comparison with other methods. Significant correlations are indicated above each bar (*p<0.05; **p<0.01; ***p<0.001). (C) Predicted immune cells' mRNA proportions (i.e., without mRNA renormalization step) vs. measured values in the same datasets. Correlations are based on Pearson correlation; RMSE: root mean squared error. Proportions of cells observed experimentally are given in Supplementary file 3B-D.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Comparison of multiple cell fraction prediction methods in blood datasets.
Heatmaps show (A) the Pearson R correlation and (B) the root mean squared error, between the cell fractions predicted by each method and the experimentally measured fractions (dataset 1 [Zimmermann et al., 2016], dataset 2 [Hoek et al., 2015], dataset 3 [Linsley et al., 2014]). Results are based either on all cell types together (noted as ‘All cells’) or for each individual cell type measured experimentally. NA's indicate cases where the cell type could not be predicted by a method. The ‘All cells’ boxes are hatched for TIMER as it does not predict the proportions from all the cell types so that the values computed there correspond to less cell types than for the other methods. For the dataset 2, as there are only two donors data, the results are only presented with all cells together (includes eight data points). In (A) the significance of the Pearson correlation is indicated by stars: *p<0.05, **p<0.01, ***p<0.001, while results with p-values above 0.1 are inside parentheses.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Effect of including an mRNA renormalization step for multiple cell fraction prediction methods.
Pearson R correlations are shown as in Figure 2—figure supplement 1A, showing here for each method its original result and the result if the predicted proportions are then renormalized by the mRNA per cell values as is done in EPIC.
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. Effect of the various steps in EPIC on the prediction accuracy.
Comparison of the predictions as done in Figure 2—figure supplement 1A, for different variations from EPIC: (1) full EPIC method; (2) EPIC if the gene expression reference profiles are scaled a priori by the mRNA per cell values instead of doing the mRNA normalization step a posteriori; (3) EPIC results without the mRNA normalization step at all; (4) EPIC results when the optimization does not include any weights based on the gene expression variability from the reference profiles.
Figure 2—figure supplement 4.
Figure 2—figure supplement 4.. Results with or without known reference profiles for T cells for the cell fraction predictions from various methods.
Results are shown similarly than in Figure 2—figure supplement 1A. Here, we present for various cell fraction prediction methods the results considering all the immune cell types in the gene expression reference profiles followed by the results obtained when removing all references to T cell (and their subsets) from these reference profiles. Only the results of the predictions from the other immune cells than T cells are shown. The effect of removing T cells from MCPcounter and TIMER could not be tested because one cannot select the cell reference profiles or cell types to use in the input of the R codes for these methods.
Figure 3.
Figure 3.. Predicting cell fractions in solid tumors with reference profiles from circulating cells.
(A) Comparison of EPIC predictions with our flow cytometry data of lymph nodes from metastatic melanoma patients. (B) Comparison with immunohistochemistry data from colon cancer primary tumors (Becht et al., 2016). (C) Comparison with single-cell RNA-Seq data (Tirosh et al., 2016) from melanoma samples either from lymphoid tissues or primary and non-lymphoid metastatic tumors. Correlations are based on Pearson correlation. Proportions of cells observed experimentally are given in Supplementary file 3A,E.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Sketch of the experiment designed to validate EPIC predictions starting from in vivo tumor samples.
Figure 4.
Figure 4.. Predictions with reference profiles from tumor-infiltrating cells.
Same as Figure 3 but based on reference profiles built from the single-cell RNA-Seq data of primary tumor and non-lymphoid metastatic melanoma samples from Tirosh et al. (2016). (A) Comparison with flow cytometry data of lymph nodes from metastatic melanoma patients. (B) Comparison with IHC from colon cancer primary tumors (Becht et al., 2016). (C) Comparison with single-cell RNA-Seq data (Tirosh et al., 2016). For primary tumor and non-lymphoid metastasis samples, a leave-one-out procedure was used (see Materials and methods). Proportions of cells observed experimentally are given in Supplementary file 3A,E.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Comparison of EPIC results per cell type for gene expression reference profiles from circulating or tumor-infiltrating immune cells.
(A) Pearson R correlation and (B) RMSE between the cell fractions predicted and the experimentally measured fractions (from flow cytometry of lymph nodes from metastatic melanoma patients (this study), colorectal cancer IHC from primary tumors (Becht et al., 2016) and single-cell RNA-Seq data from melanoma (Tirosh et al., 2016). NA’s indicate cases where the cell type could not be predicted by a method. #: No predictions for endothelial cells were done in the primary tumors from single-cell RNA-seq data as only one patient had such cells and no profiles could be built through the leave-1-out procedure used for this dataset. The ‘Cancer +other cells’ correspond to cancer cells and other stromal and endothelial cells. No RMSE value can be computed for the IHC data in (B) as the measured values are not for all cells and do not reflect cell proportions. In (A) the significance of the Pearson correlation is indicated by stars: * p.value < 0.05, ** p.value < 0.01, *** p.value < 0.001, while results with p-values above 0.1 are inside parentheses.
Figure 5.
Figure 5.. Performance comparison with other methods in tumor samples.
(A) Pearson correlation R-values between the cell proportions predicted by EPIC and ISOpure and the observed proportions measured by flow cytometry or single-cell RNA-Seq (Tirosh et al., 2016), considering all cell types together (i.e., B, CAFs, CD4 T, CD8 T, endothelial, NK, macrophages and cancer cells). (B) Same analysis as in Figure 5A but considering only immune cell types (i.e., B, CD4 T, CD8 T, NK and macrophages) in order to include more methods in the comparison. (C) Analysis of ESTIMATE predictions in the single-cell RNA-Seq dataset for the sum of all immune cells, the proportion of stromal cells (cancer-associated fibroblasts) and the proportion of cancer cells (cells identified as melanoma cells in Tirosh et al.). (D) Same as Figure 5C but for EPIC predictions of immune, stromal and cancer cells. Significant correlations in (A–B) are indicated above each bar (*p<0.05; **p<0.01; ***p<0.001).
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Comparison of multiple cell fraction prediction methods in tumor datasets.
(A) Pearson R correlation and (B) root mean squared error between the cell fractions predicted by each method and the experimentally measured fractions (from flow cytometry (this study), colorectal cancer immunohistochemistry (Becht et al., 2016) and single-cell RNA-Seq data (Tirosh et al., 2016). Results are based either on cell types grouped together (noted as ‘All cells’, including the immune, endothelial, stromal and cancer cells, or ‘All immune cells’, including only the immune cell types) or for each individual cell type that had been measured experimentally. NA’s indicate cases where the cell type could not be predicted by a method. #: No predictions for endothelial cells were done with EPIC in the primary tumors from single-cell RNA-seq data as only one patient had such cells and no profiles could be built through the leave-1-out procedure used for this dataset. The ‘Cancer + other cells’ correspond to cancer cells and other stromal and endothelial cells. In (A) the significance of the Pearson correlation is indicated by stars: *p<0.05, **p<0.01, ***p<0.001, while results with p-values above 0.1 are inside parentheses.
Figure 5—figure supplement 2.
Figure 5—figure supplement 2.. Comparison of cell fraction prediction methods with flow cytometry data of melanoma tumors.
(A) Comparison directly of all cell types together. When a cell type could not be predicted by a given method, this cell type is absent from the subfigure. (B) Comparison per cell type for MCP-counter as the predictions are not comparable across different cell types. CD4 T cells and melanoma cell proportions are not predicted by MCP counter. Correlation and RMSE values are available in Figure 5—figure supplement 1.
Figure 5—figure supplement 3.
Figure 5—figure supplement 3.. Comparison of cell fraction prediction methods with immunohistochemistry data in colon cancer data (Becht et al., 2016) for T cell, CD8 T cell and macrophage infiltration values.
Observed values are in number of cells/mm2. Correlation values are available in Figure 5—figure supplement 1.
Figure 5—figure supplement 4.
Figure 5—figure supplement 4.. Comparison of cell fraction prediction methods with single-cell RNA-Seq data from melanoma tumors (Tirosh et al., 2016).
(A) Comparison directly of all cell types together. When a cell type could not be predicted by a given method, this cell type is absent from the subfigure. (B) Results for MCP-counter, splitting the different cell types as the predictions are not comparable across different cell types. CD4 T cells and melanoma cell proportions are not predicted by MCP counter. Correlation and RMSE values are available in Figure 5—figure supplement 1.
Figure 5—figure supplement 5.
Figure 5—figure supplement 5.. Comparison between ESTIMATE scores (A) and EPIC predictions (B) in our new flow cytometry dataset.
The predictions are compared to the observed cell proportions. ESTIMATE returns a score of global immune infiltration and thus the sum of all observed immune cells has been taken for the comparison. The observed cancer cells correspond to the melan-A + cells. Correlations between observed fractions and predictions are based on Spearman correlations.
Figure 5—figure supplement 6.
Figure 5—figure supplement 6.. Predicting Thelper and Treg cell fractions in tumors.
The proportions of Thelper and Treg cells predicted by EPIC and CIBERSORT are compared to the proportions observed in the bulk samples reconstructed from the single-cell RNA-seq data from melanoma tumors (Tirosh et al., 2016). Pearson correlations and RMSE are indicated on the figures.
Author response image 1.
Author response image 1.. Comparison between EPIC predictions and measured cell fractions in PBMC dataset from Zimmermann et al. 2016.
Author response image 2.
Author response image 2.. Comparison between the experimentally measured cell fractions and EPIC predictions, including additional cell types in: (A) our expanded flow cytometry analysis of melanoma; (B) lymph node metastasis and primary tumor melanoma data from Tirosh et al., 2016.
Author response image 3.
Author response image 3.. Comparison of the prediction accuracies for EPIC, ISOpure based on all genes and ISOpure based on the subset of signature genes we derived for EPIC.
(A) For all immune cell types in the blood datasets (dataset 1: Zimmermann et al. 2016; dataset2: Hoek et al. 2015; dataset 3: Linsley et al. 2014). (B) and (C) in the tumor datasets, based on all cell types, including immune, stromal and cancer cells (B), or based only on all the immune cell types (C) (flow cytometry: our new experiment; single-cell RNA-seq: data from Tirosh et al. 2016). The stars above each bar indicate if the Pearson correlation was significant (* p < 0.05; ** p < 0.01; *** p < 0.001). These figures are the same than in our manuscript Figures 2B and 5A-B but comparing different ISOpure results and EPIC ones.

Similar articles

Cited by

References

    1. Ahn J, Yuan Y, Parmigiani G, Suraokar MB, Diao L, Wistuba II, Wang W. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. Bioinformatics. 2013;29:1865–1871. doi: 10.1093/bioinformatics/btt301. - DOI - PMC - PubMed
    1. Angelova M, Charoentong P, Hackl H, Fischer ML, Snajder R, Krogsdam AM, Waldner MJ, Bindea G, Mlecnik B, Galon J, Trajanoski Z. Characterization of the immunophenotypes and antigenomes of colorectal cancers reveals distinct tumor escape mechanisms and novel targets for immunotherapy. Genome Biology. 2015;16:64. doi: 10.1186/s13059-015-0620-6. - DOI - PMC - PubMed
    1. Anghel CV, Quon G, Haider S, Nguyen F, Deshwar AG, Morris QD, Boutros PC. ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics. 2015;16:156. doi: 10.1186/s12859-015-0597-x. - DOI - PMC - PubMed
    1. Balch CM, Riley LB, Bae YJ, Salmeron MA, Platsoucas CD, von Eschenbach A, Itoh K. Patterns of human tumor-infiltrating lymphocytes in 120 human cancers. Archives of Surgery. 1990;125:200–205. doi: 10.1001/archsurg.1990.01410140078012. - DOI - PubMed
    1. Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, Ryu JH, Wagner BK, Shen-Orr SS, Klein AM, Melton DA, Yanai A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Systems. 2016;3:346–360. doi: 10.1016/j.cels.2016.08.011. - DOI - PMC - PubMed

Publication types

Grants and funding

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.