CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis

Yihang Xiao,Jinyi Liu,Yan Zheng,Xiaohan Xie,Jianye Hao,Mingzhi Li,Ruitao Wang,Fei Ni,Yuxiao Li,Jintian Luo,Shaoqing Jiao,Jiajie Peng
DOI: https://doi.org/10.1101/2024.05.13.593861
2024-05-17
Abstract:Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research, as it enables the precise characterization of cellular heterogeneity. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. To address this, we introduce CellAgent, an LLM-driven multi-agent framework, specifically designed for the automatic processing and execution of scRNA-seq data analysis tasks, providing high-quality results with no human intervention. Firstly, to adapt general LLMs to the biological field, CellAgent constructs LLM-driven biological expert roles-planner, executor, and evaluator-each with specific responsibilities. Then, CellAgent introduces a hierarchical decision-making mechanism to coordinate these biological experts, effectively driving the planning and step-by-step execution of complex data analysis tasks. Furthermore, we propose a self-iterative optimization mechanism, enabling CellAgent to autonomously evaluate and optimize solutions, thereby guaranteeing output quality. We evaluate CellAgent on a comprehensive benchmark dataset encompassing dozens of tissues and hundreds of distinct cell types. Evaluation results consistently show that CellAgent effectively identifies the most suitable tools and hyperparameters for single-cell analysis tasks, achieving optimal performance. This automated framework dramatically reduces the workload for science data analyses, bringing us into the "Agent for Science" era.
Bioinformatics
What problem does this paper attempt to address?
The problem addressed in this paper is how to utilize large-scale language models (LLMs) to design a specialized bioinformatics framework for automating single-cell data analysis tasks, thus reducing the workload and technical barriers for researchers in complex data processing. CellAgent is a LLM-driven multi-agent framework specifically designed for automated analysis of single-cell RNA sequencing (scRNA-seq) data. It addresses this problem by constructing three LLM-driven expert roles: planner, executor, and evaluator. Each role is responsible for specific tasks such as planning analysis steps, executing code, and evaluating the quality of results. CellAgent introduces a hierarchical decision-making mechanism and a self-iterative optimization mechanism to coordinate these experts and ensure the quality of output results. Through evaluation on a comprehensive benchmark dataset covering multiple tissues and hundreds of different cell types, CellAgent demonstrates its ability to effectively identify suitable tools and hyperparameters for optimal performance. This reduces the technical requirements for scientists to perform single-cell analysis tasks, driving the arrival of the "era of scientific agents".