Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 22;25(1):bbad507.
doi: 10.1093/bib/bbad507.

scFed: federated learning for cell type classification with scRNA-seq

Affiliations

scFed: federated learning for cell type classification with scRNA-seq

Shuang Wang et al. Brief Bioinform. .

Abstract

The advent of single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and complexity in biological tissues. However, the nature of large, sparse scRNA-seq datasets and privacy regulations present challenges for efficient cell identification. Federated learning provides a solution, allowing efficient and private data use. Here, we introduce scFed, a unified federated learning framework that allows for benchmarking of four classification algorithms without violating data privacy, including single-cell-specific and general-purpose classifiers. We evaluated scFed using eight publicly available scRNA-seq datasets with diverse sizes, species and technologies, assessing its performance via intra-dataset and inter-dataset experimental setups. We find that scFed performs well on a variety of datasets with competitive accuracy to centralized models. Though Transformer-based model excels in centralized training, its performance slightly lags behind single-cell-specific model within the scFed framework, coupled with a notable time complexity concern. Our study not only helps select suitable cell identification methods but also highlights federated learning's potential for privacy-preserving, collaborative biomedical research.

Keywords: cell type; classification; federated learning; scRNA-seq.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of scFed: clients use local gene expression data from scRNA-seq to train local models; the local models are used to update the global model. The aggregated global model is passed to the local models for further training.
Figure 2
Figure 2
Comparison of centrialized, local and global model for intra-dataset cell type classification performance among eight datasets. The F1 scores are recorded after five independent repetitions. The P-value intervals shown at the top of the figure are calculated with a Wilcoxon signed-rank test for the comparison between the corresponding boxplots.
Figure 3
Figure 3
Comparison of centralized, local and global model for intra-dataset cell type classification performance among SVM, ACTINN and XGBoost algorithms. The F1 scores are recorded after five independent repetitions. The P-value intervals shown at the top of the figure are calculated with a Wilcoxon signed-rank test for the comparison between the corresponding boxplots.
Figure 4
Figure 4
Comparison of centralized, local and global model for intra-dataset cell type classification performance varying numbers of clients. The F1 scores are recorded after five independent repetitions. The P-value intervals shown at the top of the figure are calculated with a Wilcoxon signed-rank test for the comparison between the corresponding boxplots.
Figure 5
Figure 5
Comparison of centralized, local and global model for inter-dataset cell type classification performance taking ‘BaronHuman’, ‘Muraro’, ‘Segerstolpe’ and ‘Xin’ as the test dataset, respectively. The F1 scores are recorded after five independent repetitions. The P-value intervals shown at the top of the figure are calculated with a Wilcoxon signed-rank test for the comparison between the corresponding boxplots.

Similar articles

Cited by

References

    1. Lähnemann D, Köster J, Szczurek E, et al. . Eleven grand challenges in single-cell data science. Genome Biol 2020;21(1):1–35. - PMC - PubMed
    1. Ein-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci 2006;103(15):5923–8. - PMC - PubMed
    1. Kiselev VY, Yiu A, Hemberg M. Scmap: projection of single-cell rna-seq data across data sets. Nat Methods 2018;15(5):359–62. - PubMed
    1. Ferguson D. A privacy concern: Bioinformatics and storing biodata. In The ADMI 2021 Symposium, 2021.
    1. Sav S, Bossuat J-P, Troncoso-Pastoriza JR, et al. . Privacy-preserving federated neural network learning for disease-associated cell classification. Patterns 2022;3(5):100487. - PMC - PubMed

Publication types