Evaluating Large Language Models for Radiology Natural Language Processing

Zhengliang Liu,Tianyang Zhong,Yiwei Li,Yutong Zhang,Yi Pan,Zihao Zhao,Peixin Dong,Chao Cao,Yuxiao Liu,Peng Shu,Yaonai Wei,Zihao Wu,Chong Ma,Jiaqi Wang,Sheng Wang,Mengyue Zhou,Zuowei Jiang,Chunlin Li,Jason Holmes,Shaochen Xu,Lu Zhang,Haixing Dai,Kai Zhang,Lin Zhao,Yuanhao Chen,Xu Liu,Peilong Wang,Pingkun Yan,Jun Liu,Bao Ge,Lichao Sun,Dajiang Zhu,Xiang Li,Wei Liu,Xiaoyan Cai,Xintao Hu,Xi Jiang,Shu Zhang,Xin Zhang,Tuo Zhang,Shijie Zhao,Quanzheng Li,Hongtu Zhu,Dinggang Shen,Tianming Liu
2023-07-27
Abstract:The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain.
Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the comprehensive evaluation of large language models (LLMs) in the field of radiology natural language processing (NLP). Specifically: 1. **Evaluation Objectives**: The paper evaluates the understanding capabilities of 32 large language models on radiology reports, particularly their ability to extract impressions from radiological findings. 2. **Filling the Gap**: There is currently a lack of comprehensive evaluation studies on these models, especially in the field of radiology NLP. Many existing LLMs have bilingual capabilities in English and Chinese, but their performance has not been fully assessed. 3. **Application Guidance**: The research findings provide key insights into the performance, strengths, and weaknesses of these models, thereby informing practical applications in the medical field. In summary, the goal of this study is to systematically evaluate large language models in the critical field of radiology NLP, to drive future research and optimize the practical application of these models.