LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them

Wenya Xie,Qingying Xiao,Yu Zheng,Xidong Wang,Junying Chen,Ke Ji,Anningzhe Gao,Xiang Wan,Feng Jiang,Benyou Wang

2024-06-26

Abstract:The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning the LLMs to be medical assistants who collaborate with more experienced doctors. We first conduct a two-stage survey by inspiration-feedback to gain a broad understanding of the real needs of doctors for medical assistants. Based on this, we construct a Chinese medical dataset called DoctorFLAN to support the entire workflow of doctors, which includes 92K Q\&A samples from 22 tasks and 27 specialists. Moreover, we evaluate LLMs in doctor-oriented scenarios by constructing the DoctorFLAN-\textit{test} containing 550 single-turn Q\&A and DotaBench containing 74 multi-turn conversations. The evaluation results indicate that being a medical assistant still poses challenges for existing open-source models, but DoctorFLAN can help them significantly. It demonstrates that the doctor-oriented dataset and benchmarks we construct can complement existing patient-oriented work and better promote medical LLMs research.

Computation and Language

What problem does this paper attempt to address?

The paper aims to address the challenges of large language models (LLMs) in medical applications, particularly how to use these models as assistants to doctors rather than replacements. Current LLMs have some issues when providing medical advice, such as patients lacking the expertise to verify the accuracy of the model's output, which can lead to serious medical risks. To address these issues, the authors propose a new approach where the LLM acts as a doctor's assistant, collaborating with experienced doctors to ensure the accuracy and safety of the model's output. Specifically, the authors conducted the following work: 1. **Needs Assessment**: Through a two-stage survey study, they understood the actual needs of doctors for medical assistance and identified four tasks most suitable for LLM assistance. 2. **Dataset Construction**: They created a Chinese medical dataset named DoctorFLAN, containing approximately 92,000 samples, covering 22 tasks and 27 specialties throughout the doctor's workflow. 3. **Benchmark Design**: They developed two evaluation benchmarks, DoctorFLAN-test and DotaBench, to assess the performance of LLMs in single-turn Q&A and multi-turn dialogue scenarios. 4. **Experimental Analysis**: They conducted both automatic and manual evaluations of existing medical LLMs, showing that current models perform poorly on complex clinical tasks, while models trained on DoctorFLAN show significant improvements. In summary, the goal of this paper is to reposition the role of LLMs as assistants to doctors, thereby improving efficiency and safety in the medical field.

LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

The Potential of LLMs in Medical Education: Generating Questions and Answers for Qualification Exams

Large language models in medical and healthcare fields: applications, advances, and challenges

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

AI Hospital: Interactive Evaluation and Collaboration of LLMs As Intern Doctors for Clinical Diagnosis

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain

PMC-LLaMA: toward building open-source language models for medicine

Impact of Large Language Models on Medical Education and Teaching Adaptations

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models

Evaluating large language models in medical applications: a survey

Large Language Models Leverage External Knowledge to Extend Clinical Insight Beyond Language Boundaries

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

Demystifying Large Language Models for Medicine: A Primer

DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering