Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Hong Zhou,Hong-Lin Wang,Yu-Yu Duan,Zi-Neng Yan,Rui Luo,Xiang-Xin Lv,Yi Xie,Jia-Yao Zhang,Jia-Ming Yang,Ming-di Xue,Ying Fang,Lin Lu,Peng-Ran Liu,Zhe-Wei Ye
DOI: https://doi.org/10.1007/s11596-024-2929-4
Abstract:Objective: This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields. Methods: This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons. Results: Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively. Conclusion: The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.
What problem does this paper attempt to address?