Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 1;118(22):e2100293118.
doi: 10.1073/pnas.2100293118.

Detection of differentially abundant cell subpopulations in scRNA-seq data

Affiliations

Detection of differentially abundant cell subpopulations in scRNA-seq data

Jun Zhao et al. Proc Natl Acad Sci U S A. .

Abstract

Comprehensive and accurate comparisons of transcriptomic distributions of cells from samples taken from two different biological states, such as healthy versus diseased individuals, are an emerging challenge in single-cell RNA sequencing (scRNA-seq) analysis. Current methods for detecting differentially abundant (DA) subpopulations between samples rely heavily on initial clustering of all cells in both samples. Often, this clustering step is inadequate since the DA subpopulations may not align with a clear cluster structure, and important differences between the two biological states can be missed. Here, we introduce DA-seq, a targeted approach for identifying DA subpopulations not restricted to clusters. DA-seq is a multiscale method that quantifies a local DA measure for each cell, which is computed from its k nearest neighboring cells across a range of k values. Based on this measure, DA-seq delineates contiguous significant DA subpopulations in the transcriptomic space. We apply DA-seq to several scRNA-seq datasets and highlight its improved ability to detect differences between distinct phenotypes in severe versus mildly ill COVID-19 patients, melanomas subjected to immune checkpoint therapy comparing responders to nonresponders, embryonic development at two time points, and young versus aging brain tissue. DA-seq enabled us to detect differences between these phenotypes. Importantly, we find that DA-seq not only recovers the DA cell types as discovered in the original studies but also reveals additional DA subpopulations that were not described before. Analysis of these subpopulations yields biological insights that would otherwise be undetected using conventional computational approaches.

Keywords: RNA-seq; local differential abundance; single cell.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Schematic demonstration of DA-seq. (A) Illustration of the DA-seq algorithm. DA-seq detects DA subpopulations by analyzing cells from two biological states. The input of the algorithm is the union of data from two states after initial dimension reduction. Step 1: Computing a multiscale score vector, based on the k-nearest neighbors (kNN) of each cell, for several values of k (e.g., k=4,8,12). Step 2: Training a logistic classifier to predict the biological state of each cell based on the multiscale score to obtain a single DA measure. The algorithm retains only cells for which the DA measure is above a threshold τh or below τl and hence may reside in DA subpopulations. Step 3: Clustering the cells retained in step 2 to obtain contiguous DA subpopulations above a predefined size. These subpopulations are denoted DA1, DA2, and DA3. The degree of their differential abundance is quantified by a DA score (SI Appendix, Note 1). Step 4: Detect subsets of genes that characterize each of the DA subpopulations. For example, the genes G7 and G8 characterize DA3. (B) Standard clustering analysis vs. DA-seq. (Left) Cluster information obtained through standard clustering analysis. (Center) DA subpopulations identified through DA-seq. (Right) Normalized differential abundance of DA subpopulations and clusters, represented by DA score.
Fig. 2.
Fig. 2.
Immune cells from responding and nonresponding melanoma patients treated with checkpoint therapy. (A–D) t-SNE embedding of 16,291 cells from ref. . (A) Cells colored by status of response to immune therapy. (B) Cells colored by cluster labels from ref. . (C) Cells colored by DA measure. Large (small) values indicate a high abundance of cells from the pool of nonresponder (responder) samples. (D) Five distinct DA subpopulations obtained by clustering cells with |DA measure|> 0.8. (E) DA score of DA subpopulations and predefined clusters. (F) Dot plot for markers characterizing the five selected DA subpopulations. The color intensity of each dot corresponds to the average gene expression across all cells in the DA subpopulation excluding the cells with zero expression values. The lowest row in the plot corresponds to the non-DA cells (cells not included in any DA subpopulations). (G) Dot plot for markers that distinguish DA4 and the complementary cells within G5.
Fig. 3.
Fig. 3.
Comparing embryonic mouse dermal cells in embryonic days E13.5 and E14.5. (A–E) Data from Gupta et al. (16). (A–D) t-SNE embedding of 15,325 cells. (A) Embryonic day of each cell. (B) Cells colored by DA measure. Large (small) values indicate a high abundance of cells from E14.5 (E13.5). (C) Distinct DA subpopulations obtained by clustering cells with |DA measure|> 0.8. (D) Normalized Sox2 gene expression. (E) Dot plot of several markers that characterize DA subpopulations. Details are as in Fig. 2F. (F) Validation on data from Fan et al. (27). Violin plots compare gene module scores between E15 and E13 samples in dermal cells of data from ref. . Gene modules are defined from DA subpopulations in C. Wilcoxon test is used to calculate P values. ***P < 0.001.
Fig. 4.
Fig. 4.
Comparing immune cells from patients with severe and moderate COVID-19. (A–F) Data from Chua et al. (5). (A–D) t-SNE embedding of 80,109 cells. (A) Cells colored by disease severity of COVID-19, critical or moderate. (B) Cells colored by cluster labels from ref. . CTL, cytotoxic T cell; MC, mast cell; moDC, monocyte-derived dendritic cell; MoD-Ma, monocyte-derived macrophage; Neu, neutrophil; NK, natural killer cell; NKT, natural killer T cell; NKT-p, proliferating NKT cell; nrMa, nonresident macrophage; pDC, plasmacytoid dendritic cell; rMa, resident macrophage; Treg, regulatory T cell. (C) Cells colored by DA measure. Large (small) values indicate a high abundance of cells from the pool of critical (moderate) cases. (D) Five distinct DA subpopulations obtained by clustering cells with |DA measure|> 0.8. (E) Dot plot for markers characterizing the selected DA subpopulations. Details are as in Fig. 2F. (F) Dot plots for markers of DA subpopulations, comparing each DA subpopulation to the complementary part in the corresponding cluster. (G) Validation on data from Liao et al. (6). Violin plots compare gene module scores between critical and moderate cases in matching cell types of data from ref. . Specifically, module scores of DA1, DA2, DA4, and DA5 are compared in neutrophils, macrophages, CD8 T cells, and neutrophils from Liao et al. (6), respectively. Of note, of the 7,101 immune cells analyzed for the moderate cases, only 4 were neutrophils. Gene modules are defined from DA subpopulations in D. Wilcoxon test is used to calculate P values. *P < 0.05, ***P < 0.001.

Similar articles

Cited by

References

    1. Macosko E. Z., et al. , Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015). - PMC - PubMed
    1. Zheng G. X. Y., et al. , Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017). - PMC - PubMed
    1. Burkhardt D. B., et al. , Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol., 10.1038/s41587-020-00803-5 (2021). - DOI - PMC - PubMed
    1. Laehnemann D., et al. , Eleven grand challenges in single-cell data science. Genome Biol. 21, 31 (2020). - PMC - PubMed
    1. Chua R. L., et al. , Covid-19 severity correlates with airway epithelium–immune cell interactions identified by single-cell analysis. Nat. Biotechnol. 38, 970–979 (2020). - PubMed

Publication types

MeSH terms

LinkOut - more resources