Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 2;22(5):bbab122.
doi: 10.1093/bib/bbab122.

iSMNN: batch effect correction for single-cell RNA-seq data via iterative supervised mutual nearest neighbor refinement

Affiliations

iSMNN: batch effect correction for single-cell RNA-seq data via iterative supervised mutual nearest neighbor refinement

Yuchen Yang et al. Brief Bioinform. .

Abstract

Batch effect correction is an essential step in the integrative analysis of multiple single-cell RNA-sequencing (scRNA-seq) data. One state-of-the-art strategy for batch effect correction is via unsupervised or supervised detection of mutual nearest neighbors (MNNs). However, both types of methods only detect MNNs across batches of uncorrected data, where the large batch effects may affect the MNN search. To address this issue, we presented a batch effect correction approach via iterative supervised MNN (iSMNN) refinement across data after correction. Our benchmarking on both simulation and real datasets showed the advantages of the iterative refinement of MNNs on the performance of correction. Compared to popular alternative methods, our iSMNN is able to better mix the cells of the same cell type across batches. In addition, iSMNN can also facilitate the identification of differentially expressed genes (DEGs) that are relevant to the biological function of certain cell types. These results indicated that iSMNN will be a valuable method for integrating multiple scRNA-seq datasets that can facilitate biological and medical studies at single-cell level.

Keywords: batch effect correction; iterative refinement; mutual nearest neighbor; single-cell RNA-seq.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Motivating real examples. (A) Histogram of the proportion of mismatched MNNs (i.e. MNNs from a mismatched cell type) in the five sets of integration. (B) Histogram of the proportion of MNNs of a certain cell from a mismatching cell type in the hematopoietic datasets. (C) Ratio of the number of MNNs detected in each iteration of batch effect correction, compared to the first iteration, in the five real datasets. (D) The average percentage changes in MNNs detected between the next two iterations of correction. (E) Logarithms of F statistic for the corrected data after each iteration in the five real datasets. Detailed information of the five real datasets is provided in Supplementary Table S1 (see Supplementary Data available online at http://bib.oxfordjournals.org/). The red arrows above indicate where best performance is attained.
Figure 2
Figure 2
Schematics of iSMNN.
Figure 3
Figure 3
Performance comparison among iSMNN, Seurat and MNNcorrect in simulation data. (AD) correspond to the UMAP plots for the (A) uncorrected, (B) Seurat-, (C) MNNcorrect- and (D) iSMNN-corrected results, respectively. (E) Boxplot of the logarithms of F statistic for the merged data of the two batches before and after correction.
Figure 4
Figure 4
Rank of the seven batch effect correction methods based on their performance in benchmarking datasets measured by F statistic. The methods are ordered according to the average rank across all datasets.
Figure 5
Figure 5
Performance comparison between iSMNN and alternative methods in the hematopoietic data. (AD) UMAP plot for the (A) uncorrected, (B) MNNcorrect-, (C) Seurat-, (D) Harmony-, (E) Scanorama-, (F) BBKNN-, (G) SMNN- and (H) iSMNN-corrected results. (I) Logarithms of F statistic for the merged data before and after correction.
Figure 6
Figure 6
Performance comparison between iSMNN and alternative correction methods, MNNcorrect and Seurat, in two batches of cardiac data (batches 1 and 3). (AD) UMAP plot for the (A) uncorrected, (B) Seurat-, (C) Harmony- and (D) iSMNN-corrected results for the two batches. (EH) UMAP plots for the (E) uncorrected, (F) Seurat-, (G) Harmony- and (H) iSMNN-corrected results for the cell types across batches. (I) Overlap of DEGs upregulated in the CM cluster over the EC cluster after iSMNN and Seurat correction. (JL) Heatmap showing gene expression profile of the DEGs upregulated in the CM cluster over the EC cluster, identified by (J) iSMNN specifically, (K) both iSMNN and Seurat and (L) Seurat specifically in cardiac batch 1. (M, N) Feature-enriched GO terms for the overexpressed DEGs in CM cluster over EC cluster that were identified by (N) iSMNN specifically and (M) both iSMNN and Seurat. (O) Average Silhouette Index for the CM and EC clusters defined by iSMNN and Seurat, respectively. (P) IHC staining for the typical CM marker cTnT, typical EC marker Pecam1 and one DEG Cryab specifically identified by iSMNN.

Similar articles

Cited by

References

    1. Rozenblatt-Rosen O, Stubbington MJT, Regev A, et al. The human cell atlas: from vision to reality. Nat News 2017;550:451. - PubMed
    1. Zheng GXY, Terry JM, Belgrader P, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 2017;8:1–12. - PMC - PubMed
    1. Gligorijević V, Pržulj N. Methods for biological data integration: perspectives and challenges. J R Soc Interface 2015;12:20150571. - PMC - PubMed
    1. Bock C, Farlik M, Sheffield NC. Multi-omics of single cells: strategies and applications. Trends Biotechnol 2016;34:605–8. - PMC - PubMed
    1. Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet 2019;20:257–72. - PubMed

Publication types