Abstract:Artificial intelligence (AI) has gained significant attention in healthcare consultation due to its potential to improve clinical workflow and enhance medical communication. However, owing to the complex nature of medical information, large language models (LLM) trained with general world knowledge might not possess the capability to tackle medical-related tasks at an expert level. Here, we introduce EyeGPT, a specialized LLM designed specifically for ophthalmology, using three optimization strategies including role-playing, finetuning, and retrieval-augmented generation. In particular, we proposed a comprehensive evaluation framework that encompasses a diverse dataset, covering various subspecialties of ophthalmology, different users, and diverse inquiry intents. Moreover, we considered multiple evaluation metrics, including accuracy, understandability, trustworthiness, empathy, and the proportion of hallucinations. By assessing the performance of different EyeGPT variants, we identify the most effective one, which exhibits comparable levels of understandability, trustworthiness, and empathy to human ophthalmologists (all Ps>0.05). Overall, ur study provides valuable insights for future research, facilitating comprehensive comparisons and evaluations of different strategies for developing specialized LLMs in ophthalmology. The potential benefits include enhancing the patient experience in eye care and optimizing ophthalmologists' services.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to develop a large - language model (LLM) specifically for ophthalmology to improve its professionalism and accuracy in the field of ophthalmology, thereby improving the clinical workflow and medical communication. Specifically, the authors hope that through optimization strategies, the LLM can better handle complex and professional medical information, reduce the generation of misinformation, and verify the effectiveness of these optimization strategies through a comprehensive evaluation framework. ### Main problem decomposition 1. **Limitations of existing LLMs in the field of ophthalmology**: - **Lack of professional knowledge**: Although existing large - language models have extensive knowledge, they perform poorly when dealing with specific medical fields such as ophthalmology. For example, the complete accuracy rate of ChatGPT in retinal diseases is only 15.4%. - **Generation of misinformation (hallucinations)**: LLMs sometimes generate misleading information, which is especially dangerous in the medical field. - **Lack of comprehensive evaluation**: Currently, the evaluation of LLMs in the field of ophthalmology mainly focuses on the form of multiple - choice questions and fails to fully test their performance in practical applications. 2. **The need to develop a dedicated model**: - **Enhance professional ability**: A specially trained model is required to be able to understand and generate texts that meet the professional requirements of ophthalmology. - **Improve credibility and empathy**: The model not only needs to provide accurate information but also be able to understand and respond to the emotional needs of patients. - **Reduce misinformation**: By introducing external knowledge bases and other methods, reduce the probability of the model generating misinformation. ### Solutions To solve the above problems, the authors proposed EyeGPT, a large - language model specifically designed for ophthalmology. They adopted the following three optimization strategies: 1. **Role - playing**: Let the model simulate the role of an ophthalmologist to generate more professional and empathetic answers. 2. **Finetuning**: Use a data set containing ophthalmology - specific knowledge to fine - tune the model, making it better at handling ophthalmology - related terms and logical reasoning. 3. **Retrieval - Augmented Generation (RAG)**: Combine external knowledge bases (such as medical books and manually constructed databases) to improve the accuracy and reliability of the model. In addition, the authors also designed a comprehensive evaluation framework that covers multiple evaluation metrics (such as accuracy, comprehensibility, credibility, and empathy) and tests the performance of the model through different types of users and question types. ### Goals Through these optimization strategies and evaluation frameworks, the research aims to: - Improve the professional level of ophthalmology AI assistants so that they can better serve ophthalmologists and patients. - Provide valuable references and guidance for the future development and evaluation of large - language models in specific fields. In summary, the core objective of this paper is to explore how to use deep - learning technology to improve the quality and efficiency of ophthalmology medical services through the development and evaluation of EyeGPT.

EyeGPT: Ophthalmic Assistant with Large Language Models

Exploring large language model for next generation of artificial intelligence in ophthalmology

Uncovering Language Disparity of ChatGPT in Healthcare: Non-English Clinical Environment for Retinal Vascular Disease Classification (Preprint)

Development and evaluation of a large language model of ophthalmology in Chinese

Medical education with large language models in ophthalmology: custom instructions and enhanced retrieval capabilities

Utility of artificial intelligence‐based large language models in ophthalmic care

Evaluating Large Language Models in Ophthalmology

Review of emerging trends and projection of future developments in large language models research in ophthalmology

A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation

ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology

Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

Large Language Models in Ophthalmology: Potential and Pitfalls

Understanding natural language: Potential application of large language models to ophthalmology

Comparative study of different large language models and medical professionals of different levels responding to ophthalmology questions

Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology

OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue

Comparative Analysis of GPT-4Vision, GPT-4 and Open Source LLMs in Clinical Diagnostic Accuracy: A Benchmark Against Human Expertise

OphGLM: An ophthalmology large language-and-vision assistant

Utilizing Large Language Models in Ophthalmology: The Current Landscape and Challenges

Large language models and their impact in ophthalmology

EYE-Llama, an in-domain large language model for ophthalmology