Abstract:Objective: Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical NLP tasks in different languages, We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks. Materials and Methods: We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, a two-stage strategy is proposed for supervised fine-tuning to optimize the model performance across varied tasks. Results: Experimental results on 13 test sets covering named entity recognition, relation extraction, text classification, question answering tasks demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking. Conclusion: Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multi-tasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches of smaller language models.

What problem does this paper attempt to address?

The paper aims to address the following issues: 1. **Multilingual Biomedical Task Processing**: Most existing large language models (LLMs) for biomedical tasks focus on enhancing the performance of biomedical question answering and dialogue tasks in a single language (such as English or Chinese). This paper proposes a bilingual (English and Chinese) fine-tuned large language model, Taiyi, to explore its performance in various biomedical natural language processing (BioNLP) tasks across different languages. 2. **Dataset Integration and Standardization**: Researchers collected and organized 140 existing biomedical text mining datasets (including 102 English datasets and 38 Chinese datasets), spanning over 10 different task types. To ensure data consistency and format uniformity, a standardized data schema was designed. 3. **Two-Stage Supervised Fine-Tuning Strategy**: A two-stage supervised fine-tuning strategy is proposed to optimize the model's performance across various tasks. The first stage involves fine-tuning for non-generative tasks, while the second stage combines all tasks for incremental training. 4. **Evaluation and Comparison**: The performance of the Taiyi model was validated through 13 test sets (covering named entity recognition, relation extraction, text classification, and question answering tasks) and compared with general LLMs and other specialized biomedical LLMs. In summary, the main goal of the paper is to develop a large language model, Taiyi, capable of effectively handling various biomedical natural language processing tasks in a bilingual environment and to experimentally validate its superior performance.

Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks