EyeGPT: Ophthalmic Assistant with Large Language Models

Xiaolan Chen,Ziwei Zhao,Weiyi Zhang,Pusheng Xu,Le Gao,Mingpu Xu,Yue Wu,Yinwen Li,Danli Shi,Mingguang He
2024-02-29
Abstract:Artificial intelligence (AI) has gained significant attention in healthcare consultation due to its potential to improve clinical workflow and enhance medical communication. However, owing to the complex nature of medical information, large language models (LLM) trained with general world knowledge might not possess the capability to tackle medical-related tasks at an expert level. Here, we introduce EyeGPT, a specialized LLM designed specifically for ophthalmology, using three optimization strategies including role-playing, finetuning, and retrieval-augmented generation. In particular, we proposed a comprehensive evaluation framework that encompasses a diverse dataset, covering various subspecialties of ophthalmology, different users, and diverse inquiry intents. Moreover, we considered multiple evaluation metrics, including accuracy, understandability, trustworthiness, empathy, and the proportion of hallucinations. By assessing the performance of different EyeGPT variants, we identify the most effective one, which exhibits comparable levels of understandability, trustworthiness, and empathy to human ophthalmologists (all Ps>0.05). Overall, ur study provides valuable insights for future research, facilitating comprehensive comparisons and evaluations of different strategies for developing specialized LLMs in ophthalmology. The potential benefits include enhancing the patient experience in eye care and optimizing ophthalmologists' services.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to develop a large - language model (LLM) specifically for ophthalmology to improve its professionalism and accuracy in the field of ophthalmology, thereby improving the clinical workflow and medical communication. Specifically, the authors hope that through optimization strategies, the LLM can better handle complex and professional medical information, reduce the generation of misinformation, and verify the effectiveness of these optimization strategies through a comprehensive evaluation framework. ### Main problem decomposition 1. **Limitations of existing LLMs in the field of ophthalmology**: - **Lack of professional knowledge**: Although existing large - language models have extensive knowledge, they perform poorly when dealing with specific medical fields such as ophthalmology. For example, the complete accuracy rate of ChatGPT in retinal diseases is only 15.4%. - **Generation of misinformation (hallucinations)**: LLMs sometimes generate misleading information, which is especially dangerous in the medical field. - **Lack of comprehensive evaluation**: Currently, the evaluation of LLMs in the field of ophthalmology mainly focuses on the form of multiple - choice questions and fails to fully test their performance in practical applications. 2. **The need to develop a dedicated model**: - **Enhance professional ability**: A specially trained model is required to be able to understand and generate texts that meet the professional requirements of ophthalmology. - **Improve credibility and empathy**: The model not only needs to provide accurate information but also be able to understand and respond to the emotional needs of patients. - **Reduce misinformation**: By introducing external knowledge bases and other methods, reduce the probability of the model generating misinformation. ### Solutions To solve the above problems, the authors proposed EyeGPT, a large - language model specifically designed for ophthalmology. They adopted the following three optimization strategies: 1. **Role - playing**: Let the model simulate the role of an ophthalmologist to generate more professional and empathetic answers. 2. **Finetuning**: Use a data set containing ophthalmology - specific knowledge to fine - tune the model, making it better at handling ophthalmology - related terms and logical reasoning. 3. **Retrieval - Augmented Generation (RAG)**: Combine external knowledge bases (such as medical books and manually constructed databases) to improve the accuracy and reliability of the model. In addition, the authors also designed a comprehensive evaluation framework that covers multiple evaluation metrics (such as accuracy, comprehensibility, credibility, and empathy) and tests the performance of the model through different types of users and question types. ### Goals Through these optimization strategies and evaluation frameworks, the research aims to: - Improve the professional level of ophthalmology AI assistants so that they can better serve ophthalmologists and patients. - Provide valuable references and guidance for the future development and evaluation of large - language models in specific fields. In summary, the core objective of this paper is to explore how to use deep - learning technology to improve the quality and efficiency of ophthalmology medical services through the development and evaluation of EyeGPT.