scFed: federated learning for cell type classification with scRNA-seq

Shuang Wang,Bochen Shen,Lanting Guo,Mengqi Shang,Jinze Liu,Qi Sun,Bairong Shen
DOI: https://doi.org/10.1093/bib/bbad507
IF: 9.5
2024-01-16
Briefings in Bioinformatics
Abstract:The advent of single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and complexity in biological tissues. However, the nature of large, sparse scRNA-seq datasets and privacy regulations present challenges for efficient cell identification. Federated learning provides a solution, allowing efficient and private data use. Here, we introduce scFed, a unified federated learning framework that allows for benchmarking of four classification algorithms without violating data privacy, including single-cell-specific and general-purpose classifiers. We evaluated scFed using eight publicly available scRNA-seq datasets with diverse sizes, species and technologies, assessing its performance via intra-dataset and inter-dataset experimental setups. We find that scFed performs well on a variety of datasets with competitive accuracy to centralized models. Though Transformer-based model excels in centralized training, its performance slightly lags behind single-cell-specific model within the scFed framework, coupled with a notable time complexity concern. Our study not only helps select suitable cell identification methods but also highlights federated learning's potential for privacy-preserving, collaborative biomedical research.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the challenges faced by single - cell RNA sequencing (scRNA - seq) data in cell - type classification, especially the sparsity of large - data sets and the impact of privacy regulations on efficient cell identification. To address these challenges, the paper proposes a unified federated learning framework named scFed, aiming to benchmark four classification algorithms (including single - cell - specific classifiers and general classifiers) without violating data privacy. Specifically, scFed allows collaborative training of a shared cell - type identification model among multiple institutions while keeping all training data stored locally, thereby protecting data privacy. Through evaluation using eight publicly available scRNA - seq data sets, the study has demonstrated that the performance of scFed on different data sets is comparable to that of the centralized model and, in most cases, superior to the local model. In addition, the paper also explores the performance differences of different classification algorithms (such as support vector machines, neural networks, XGBoost, and Transformer - based models) under the federated learning framework, as well as the impact of different numbers of clients on model performance. Overall, scFed not only helps in selecting appropriate cell identification methods but also shows the potential of federated learning in privacy - protected collaborative biomedical research.