Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 17;37(7):963-967.
doi: 10.1093/bioinformatics/btaa751.

Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control

Affiliations

Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control

Daniel Osorio et al. Bioinformatics. .

Abstract

Motivation: Quality control (QC) is a critical step in single-cell RNA-seq (scRNA-seq) data analysis. Low-quality cells are removed from the analysis during the QC process to avoid misinterpretation of the data. An important QC metric is the mitochondrial proportion (mtDNA%), which is used as a threshold to filter out low-quality cells. Early publications in the field established a threshold of 5% and since then, it has been used as a default in several software packages for scRNA-seq data analysis, and adopted as a standard in many scRNA-seq studies. However, the validity of using a uniform threshold across different species, single-cell technologies, tissues and cell types has not been adequately assessed.

Results: We systematically analyzed 5 530 106 cells reported in 1349 annotated datasets available in the PanglaoDB database and found that the average mtDNA% in scRNA-seq data across human tissues is significantly higher than in mouse tissues. This difference is not confounded by the platform used to generate the data. Based on this finding, we propose new reference values of the mtDNA% for 121 tissues of mouse and 44 tissues of humans. In general, for mouse tissues, the 5% threshold performs well to distinguish between healthy and low-quality cells. However, for human tissues, the 5% threshold should be reconsidered as it fails to accurately discriminate between healthy and low-quality cells in 29.5% (13 of 44) tissues analyzed. We conclude that omitting the mtDNA% QC filter or adopting a suboptimal mtDNA% threshold may lead to erroneous biological interpretations of scRNA-seq data.

Availabilityand implementation: The code used to download datasets, perform the analyzes and produce the figures is available at https://github.com/dosorio/mtProportion.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
Boxplots showing the differences in mtDNA% across species, technologies and tissues. Each dot represents a cell; the red line is the early established 5% threshold, and the blue line is the 10% threshold for human cells proposed here. In parenthesis (C and D), the number of cells in the stated tissue. (A) The difference in mtDNA% between human and mice cells. (B) The differences in mtDNA% between human and mice cells by the technology used to generate the data. (C) Boxplots of mtDNA% across 44 human tissues. (D) Boxplots of mtDNA% across 121 mouse tissues
Fig 2.
Fig 2.
Case examples showing the effect of omitting the mtDNA% QC filter in the analysis of scRNA-seq data. (A) t-SNE representation of all the cell populations included in the dataset generated by excluding the mitochondrial genes from the list of highly variable genes before principal component analysis (PCA). Each dot represents a cell and they are colored by cell type. (B) t-SNE representation of cell type used as an example colored in the function of the mtDNA% in each cell. Clusters reported by the PanglaoDB are labeled. (C) Boxplot showing the distribution of the mtDNA% across clusters. The red line is the early established 5% threshold. (D) GSEA analysis of the Apoptosis pathway between clusters with a high proportion of low-quality cells and others containing high-quality cells

Similar articles

Cited by

References

    1. AlJanahi A.A. et al. (2018) An introduction to the analysis of single-cell RNA-sequencing data. Mol. Ther. Methods Clin. Dev., 10, 189–196. - PMC - PubMed
    1. Finak G. et al. (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol., 16, 1–13. - PMC - PubMed
    1. Franzen,et O.al. (2019) PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford), 2019, baz046, 1–9. - PMC - PubMed
    1. Germain P.-L. et al. (2020) pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools. bioRxiv 2020.02.02.930578. - PMC - PubMed
    1. Guantes R. et al. (2015) Global variability in gene expression and alternative splicing is modulated by mitochondrial content. Genome Res., 25, 633–644. - PMC - PubMed

Publication types