Taming Large Language Models to Implement Diagnosis and Evaluating the Generation of Llms at the Semantic Similarity Level in Acupuncture and Moxibustion

Shusheng Li,Wenjun Tan,Changshuai Zhang,Jiale Li,Haiyan Ren,Yanliang Guo,Jing Jia,Yangyang Liu,Xingfang Pan,Jing Guo,Wei Meng,Zhaoshui He
DOI: https://doi.org/10.1016/j.eswa.2024.125920
IF: 8.5
2024-01-01
Expert Systems with Applications
Abstract:With the rapid advancement of artificial intelligence and deep learning technologies, large language models (LLMs) such as ChatGPT and GPT-4 have made significant progress in comprehending and responding to human instructions. Acupuncture and moxibustion, therapeutic modalities in Traditional Chinese Medicine (TCM), possess extensive knowledge beneficial for patient treatment. Currently, acupuncture diagnosis relies on the experience and skills of individual acupuncturists, emphasizing the need for research to improve diagnostic accuracy through objective methods. Therefore, the integration of LLMs into the field of acupuncture can facilitate the recommendation of personalized acupuncture treatment programs. However, the application of general LLMs to the field of acupuncture diagnosis often yields suboptimal results. In addition, most LLM evaluation metrics depend solely on literal overlap and fail to capture semantic similarity. To address these challenges, this paper introduces AcupunctureGPT, a specialized large language model for acupuncture diagnosis, aimed at exploring the potential application of LLMs in this field. Patient Diagnostic Acupuncture Data is constructed to enhance the diagnostic capabilities of AcupunctureGPT in acupuncture. The Generated Knowledge Filter Prompting approach is proposed to improve the accuracy of LLMs in identifying similar diseases through the development and filtering of knowledge statements. The Sentence Similarity Evaluation Module (SSEM) is employed to assess the generation quality of LLMs at the semantic level. The Sentence Adaptive Enhancement Fusion Module (SAEFM), proposed within SSEM, enhances the adaptive fusion of output features at various levels. Experimental results demonstrate that AcupunctureGPT outperforms other large language models in diagnosing diseases and devising reasonable treatment plans. Furthermore, the evaluation metrics proposed in this paper have been validated for effectiveness.
What problem does this paper attempt to address?