JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

Junda Wang,Zhichao Yang,Zonghai Yao,Hong Yu

2024-06-28

Abstract:Large Language Models (LLMs) have demonstrated a remarkable potential in medical knowledge acquisition and question-answering. However, LLMs can potentially hallucinate and yield factually incorrect outcomes, even with domain-specific pretraining. Previously, retrieval augmented generation (RAG) has limited success in addressing hallucinations. Unlike previous methods in RAG where the retrieval model was trained separately from the LLM, we introduce JMLR (for Jointly trains LLM and information Retrieval) during the fine-tuning phase. The synchronized training mechanism enhances JMLR's ability to retrieve clinical guidelines and leverage medical knowledge to reason and answer questions and reduces the demand for computational resources. We evaluated JMLR on the important medical question-answering application. Our experimental results demonstrate that JMLR-13B (70.5%) outperforms a previous state-of-the-art open-source model using conventional pre-training and fine-tuning Meditron-70B (68.9%) and Llama2-13B with RAG (67.7%) on a medical question-answering dataset. Comprehensive evaluations reveal JMLR-13B enhances reasoning quality and reduces hallucinations better than Claude3-Opus. Additionally, JMLR-13B (148 GPU hours) also trains much faster than Meditron-70B (42630 GPU hours). Through this work, we provide a new and efficient knowledge enhancement method for healthcare, demonstrating the potential of integrating retrieval and LLM training for medical question-answering systems.

Computation and Language,Information Retrieval

What problem does this paper attempt to address?

This paper focuses on improving the accuracy and reducing hallucination issues in large language models (LLMs) in the healthcare domain. Existing methods, such as retriever-augmented generation (RAG), train a retriever to retrieve relevant documents to assist LLMs. However, there are still limitations, such as the potential inconsistency between the retriever and LLM when trained separately. The authors propose a new approach called Joint Medical LLM and Retrieval Training (JMLR), which simultaneously trains the LLM and retriever during the fine-tuning stage to enhance the model's reasoning capabilities and utilization of domain knowledge. This approach incorporates the retrieved relevant documents into the input question, enabling the LLM to utilize this information when generating answers. JMLR introduces a unique mechanism called LLM-Rank loss to train the retriever, ensuring it prioritizes documents that significantly help the LLM generate correct answers. Experimental results demonstrate that the JMLR-13B model outperforms previous open-source and closed-source models, such as Meditron-70B and Llama2-13B using RAG, on multiple medical QA datasets. Moreover, JMLR-13B exhibits superiority over Claude3-Opus in reducing hallucination and improving reasoning quality, while requiring shorter training time and lower resource demands. In conclusion, this paper addresses the problem of enhancing LLM performance and reducing error generation in medical QA systems by improving the training methodology. JMLR achieves this goal by jointly training the retriever and LLM, resulting in improved accuracy and efficiency of the model.

JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering

Rationale-Guided Retrieval Augmented Generation for Medical Question Answering

Improving Clinical Expertise in Large Language Models Using Electronic Medical Records

Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model

AssistRAG: Boosting the Potential of Large Language Models with an Intelligent Information Assistant

Fine-Tuning LLMs for Reliable Medical Question-Answering Services

Benchmarking Retrieval-Augmented Generation for Medicine

oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

From Beginner to Expert: Modeling Medical Knowledge into General LLMs

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine