Evaluating the Utilities of Foundation Models in Single-cell Data Analysis

Tianyu Liu,Kexing Li,Yuge Wang,Hongyu Li,Hongyu Zhao

DOI: https://doi.org/10.1101/2023.09.08.555192

2024-08-26

Abstract:Foundation Models (FMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of FMs for single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. \textcolor{red}{Overall, the top FMs include scGPT, Geneformer, and CellPLM by considering model performances and user accessibility among ten single-cell FMs. However, by comparing these FMs with task-specific methods, we found that single-cell FMs may not consistently excel than task-specific methods in all tasks, which challenges the necessity of developing foundation models for single-cell analysis. In addition, we evaluated the effects of hyper-parameters, initial settings, and stability for training single-cell FMs based on a proposed \textbf{scEval} framework, and provide guidelines for pre-training and fine-tuning, to enhance the performances of single-cell FMs. Our work summarizes the current state of single-cell FMs, points to their constraints and avenues for future development, and offers a freely available evaluation pipeline to benchmark new models and improve method development.

Bioinformatics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the utility of foundation models (FMs) in single - cell data analysis. Specifically, the paper comprehensively evaluates the performance of ten single - cell foundation models in eight downstream tasks related to single - cell data through comprehensive experiments. These tasks include batch - effect correction, multi - omics data integration, cell - type annotation, gene - function prediction, perturbation prediction, gene - network analysis, simulation, and imputing missing values, etc. The main objectives of the paper are: 1. **Evaluate the performance of single - cell foundation models**: By comparing with task - specific methods, explore whether single - cell foundation models can outperform or at least be on a par with task - specific methods in all tasks, thereby challenging the necessity of developing single - cell foundation models. 2. **Explore the impact of hyper - parameters, initial settings, and training stability**: Based on the proposed scEval framework, evaluate the impact of different hyper - parameters, initial settings, and training stability on the performance of single - cell foundation models, and provide guidelines for pre - training and fine - tuning to improve the performance of single - cell foundation models. 3. **Summarize the current state of single - cell foundation models**: Point out their limitations and future development directions, and provide a free evaluation pipeline for benchmarking new models and improving method development. Through systematic evaluation and detailed experimental results, the paper aims to provide researchers in the field of single - cell data analysis with guidance on foundation model selection and optimization.

Evaluating the Utilities of Foundation Models in Single-cell Data Analysis

BioLLM: A Standardized Framework for Integrating and Benchmarking Single-Cell Foundation Models

scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis

Progress and opportunities of foundation models in bioinformatics

Harnessing the deep learning power of foundation models in single-cell omics

Large-scale foundation model on single-cell transcriptomics

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

scLong: A Billion-Parameter Foundation Model for Capturing Long-Range Gene Context in Single-Cell Transcriptomics

PertEval-scFM: Benchmarking Single-Cell Foundation Models for Perturbation Effect Prediction

Evaluating the role of pre-training dataset size and diversity on single-cell foundation model performance

How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment

Specialized Foundation Models Struggle to Beat Supervised Baselines

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

The Development of AI Foundation Models for Single-Cell Transcriptomics

CellPatch: a Highly Efficient Foundation Model for Single-Cell Transcriptomics with Heuristic Patching

GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Can Foundation Models Wrangle Your Data?

GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models

CancerFoundation: A single-cell RNA sequencing foundation model to decipher drug resistance in cancer