Bioinformatics Copilot 1.0: A Large Language Model-powered Software for the Analysis of Transcriptomic Data
Yongheng Wang,Weidi Zhang,Siyu Lin,Matthew S. Farruggio,Aijun Wang
DOI: https://doi.org/10.1101/2024.04.11.588958
2024-04-15
Abstract:The field of single-cell transcriptomics has been producing extensive datasets, advancing our understanding of cellular functions in various tissues, and empowering diagnosis, prognosis, and drug development. However, parsing through this data has been a monumental task, often stretching weeks to months. This bottleneck arises due to the sheer volume of data generated—ranging from hundreds of gigabytes to tens of terabytes—that demands extensive time for analysis. Moreover, the data analysis involves an intricate series of steps utilizing various software packages, creating a steep learning curve for biologists. Additionally, the iterative nature of data analysis in this domain necessitates a deep biological insight to formulate relevant questions, conduct analysis, interpret results, and refine hypotheses. This iterative loop has required close collaboration between biologists and bioinformaticians, which is hampered by protracted communication cycles. To address these challenges, we present a large language model-powered software, Bioinformatics Copilot 1.0. It allows users to analyze data through an intuitive natural language interface, without requiring proficiency in programming languages such as Python or R. It is engineered for cross-platform functionality, with support for Mac, Windows, and Linux. Importantly, it facilitates local data analysis, ensuring adherence to stringent data management regulations that govern the use of patient samples in medical and research institutions. We anticipate that this tool will expedite the data analysis workflow in numerous research endeavors, thereby accelerating advancements in the biomedical sciences.
Bioinformatics