Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Zhongzhen Huang,Kui Xue,Yongqi Fan,Linjie Mu,Ruoyu Liu,Tong Ruan,Shaoting Zhang,Xiaofan Zhang

2024-04-27

Abstract:Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new \textit{Distill-Retrieve-Read} framework instead of the previous \textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.

Computation and Language

What problem does this paper attempt to address?

This paper discusses the challenges of applying large language models (LLMs) in medical consultations, particularly their shortcomings in dealing with inaccurate facts (hallucinations) and temporal misalignment. To address these issues, the researchers propose a method called Retrieval-Augmented Generation (RAG), which involves incorporating external knowledge to assist answer generation. However, applying RAG in the medical field poses difficulties due to the lack of domain expertise and complexity of real-world scenarios. To this end, the paper proposes a new benchmark test called MedicineQA, which is a dataset consisting of multi-turn dialogues simulating real-world drug consultations. This dataset aims to evaluate the performance of LLMs in the medical domain, particularly in knowledge-intensive tasks. The researchers also introduce an improved framework called Distill-Retrieve-Read, which replaces the traditional Retrieve-then-Read approach by utilizing tool invocation mechanisms to construct search queries, simulating keyword queries in search engines. Experimental results demonstrate that the proposed Distill-Retrieve-Read framework significantly improves performance in evidence retrieval accuracy and surpasses the previous RAG method. This work provides new insights into the application of RAG in the medical field and contributes to enhancing the accuracy and reliability of LLMs in handling medical consultations.

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

A MapReduce Approach to Effectively Utilize Long Context Information in Retrieval Augmented Language Models

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications

An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

Rationale-Guided Retrieval Augmented Generation for Medical Question Answering

Benchmarking Retrieval-Augmented Generation for Medicine

MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report