LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them

Wenya Xie,Qingying Xiao,Yu Zheng,Xidong Wang,Junying Chen,Ke Ji,Anningzhe Gao,Xiang Wan,Feng Jiang,Benyou Wang
2024-06-26
Abstract:The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning the LLMs to be medical assistants who collaborate with more experienced doctors. We first conduct a two-stage survey by inspiration-feedback to gain a broad understanding of the real needs of doctors for medical assistants. Based on this, we construct a Chinese medical dataset called DoctorFLAN to support the entire workflow of doctors, which includes 92K Q\&A samples from 22 tasks and 27 specialists. Moreover, we evaluate LLMs in doctor-oriented scenarios by constructing the DoctorFLAN-\textit{test} containing 550 single-turn Q\&A and DotaBench containing 74 multi-turn conversations. The evaluation results indicate that being a medical assistant still poses challenges for existing open-source models, but DoctorFLAN can help them significantly. It demonstrates that the doctor-oriented dataset and benchmarks we construct can complement existing patient-oriented work and better promote medical LLMs research.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the challenges of large language models (LLMs) in medical applications, particularly how to use these models as assistants to doctors rather than replacements. Current LLMs have some issues when providing medical advice, such as patients lacking the expertise to verify the accuracy of the model's output, which can lead to serious medical risks. To address these issues, the authors propose a new approach where the LLM acts as a doctor's assistant, collaborating with experienced doctors to ensure the accuracy and safety of the model's output. Specifically, the authors conducted the following work: 1. **Needs Assessment**: Through a two-stage survey study, they understood the actual needs of doctors for medical assistance and identified four tasks most suitable for LLM assistance. 2. **Dataset Construction**: They created a Chinese medical dataset named DoctorFLAN, containing approximately 92,000 samples, covering 22 tasks and 27 specialties throughout the doctor's workflow. 3. **Benchmark Design**: They developed two evaluation benchmarks, DoctorFLAN-test and DotaBench, to assess the performance of LLMs in single-turn Q&A and multi-turn dialogue scenarios. 4. **Experimental Analysis**: They conducted both automatic and manual evaluations of existing medical LLMs, showing that current models perform poorly on complex clinical tasks, while models trained on DoctorFLAN show significant improvements. In summary, the goal of this paper is to reposition the role of LLMs as assistants to doctors, thereby improving efficiency and safety in the medical field.