Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jul 18;24(7):995.
doi: 10.3390/e24070995.

Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges

Affiliations
Review

Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges

Samarendra Das et al. Entropy (Basel). .

Abstract

With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approach. A number of challenges are identified in this study that must be addressed to develop the next class of innovative approaches. Furthermore, we also emphasize the methodological challenges involved in differential expression analysis of scRNA-seq data that researchers must address to draw maximum benefit from this recent single-cell technology. This study will serve as a guide to genome researchers and experimental biologists to objectively select options for their analysis.

Keywords: challenges; classification; differential expression analysis; scRNA-seq; statistical approaches.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Operational framework of differential expression analysis of scRNA-seq data. Various steps in single-cell studies are shown. Pre-processing and various steps of DE analysis are also shown. Potential use and interpretation of obtained results are presented.
Figure 2
Figure 2
Classification of available statistical approaches and tools used for DEA in single-cell studies. Classification of the approaches is conducted based on the requirement of input data, data distribution, and statistical models, etc. DE analytic tools belonging to each category are presented in pink colored boxes.
Figure 3
Figure 3
Operational outlines of DE analytic GLM and two-class comparison approaches in scRNA-seq studies. (A) Workflow of steps for GLM-based DE approaches. (B) Workflow of steps for two-class comparison approaches. In both classes, the framework can be divided into four major parts, namely: (i) input (data provided as input to tools); (ii) pre-processing of data, this step involves data cleaning, outlier removal, normalization, etc.; (iii) model fitting and computation of DE test statistic, various distributional/model (e.g., GLM, simple statistical distribution or distribution-free) assumptions are made about the expression data, parameters of the models are estimated, and DE test statistic(s) for genes and their corresponding p-values are computed; and, (iv) assessment and interpretation of DE results.
Figure 4
Figure 4
Operational outlines of DE analytic GAM, Hurdle and mixed model class of approaches in scRNA-seq studies. (A) Workflow of steps for GAM-based DEA approaches. (B) Workflow of steps for Hurdle and mixed-model-based approaches. In both classes, the framework can be divided into four major parts, namely: (i) input (data provided as input to tools); (ii) pre-processing of data, this step involves data cleaning, outlier removal, normalization, etc.; (iii) model fitting and computation of DEA test statistic, various distributional/model (e.g., GAM, Hurdle or mixture model) assumptions are made about the expression data, parameters of the models are estimated, DEA test statistic(s) for genes and their corresponding p-values are computed; and (iv) assessment and interpretation of DEA results.

Similar articles

Cited by

References

    1. Liu S., Trapnell C. Single-cell transcriptome sequencing: Recent advances and remaining challenges. F1000Research. 2016;5:182. doi: 10.12688/f1000research.7223.1. - DOI - PMC - PubMed
    1. Kiselev V.Y., Andrews T.S., Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 2019;20:273–282. doi: 10.1038/s41576-018-0088-9. - DOI - PubMed
    1. Saliba A.-E., Westermann A.J., Gorski S.A., Vogel J. Single-cell RNA-seq: Advances and future challenges. Nucleic Acids Res. 2014;42:8845–8860. doi: 10.1093/nar/gku555. - DOI - PMC - PubMed
    1. Macosko E.Z., Basu A., Satija R., Nemesh J., Shekhar K., Goldman M., Tirosh I., Bialas A.R., Kamitaki N., Martersteck E.M., et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015;161:1202–1214. doi: 10.1016/j.cell.2015.05.002. - DOI - PMC - PubMed
    1. Zheng G.X.Y., Terry J.M., Belgrader P., Ryvkin P., Bent Z.W., Wilson R., Ziraldo S.B., Wheeler T.D., McDermott G.P., Zhu J., et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 2017;8:14049. doi: 10.1038/ncomms14049. - DOI - PMC - PubMed

LinkOut - more resources