JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

Junda Wang,Zhichao Yang,Zonghai Yao,Hong Yu
2024-06-28
Abstract:Large Language Models (LLMs) have demonstrated a remarkable potential in medical knowledge acquisition and question-answering. However, LLMs can potentially hallucinate and yield factually incorrect outcomes, even with domain-specific pretraining. Previously, retrieval augmented generation (RAG) has limited success in addressing hallucinations. Unlike previous methods in RAG where the retrieval model was trained separately from the LLM, we introduce JMLR (for Jointly trains LLM and information Retrieval) during the fine-tuning phase. The synchronized training mechanism enhances JMLR's ability to retrieve clinical guidelines and leverage medical knowledge to reason and answer questions and reduces the demand for computational resources. We evaluated JMLR on the important medical question-answering application. Our experimental results demonstrate that JMLR-13B (70.5%) outperforms a previous state-of-the-art open-source model using conventional pre-training and fine-tuning Meditron-70B (68.9%) and Llama2-13B with RAG (67.7%) on a medical question-answering dataset. Comprehensive evaluations reveal JMLR-13B enhances reasoning quality and reduces hallucinations better than Claude3-Opus. Additionally, JMLR-13B (148 GPU hours) also trains much faster than Meditron-70B (42630 GPU hours). Through this work, we provide a new and efficient knowledge enhancement method for healthcare, demonstrating the potential of integrating retrieval and LLM training for medical question-answering systems.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
This paper focuses on improving the accuracy and reducing hallucination issues in large language models (LLMs) in the healthcare domain. Existing methods, such as retriever-augmented generation (RAG), train a retriever to retrieve relevant documents to assist LLMs. However, there are still limitations, such as the potential inconsistency between the retriever and LLM when trained separately. The authors propose a new approach called Joint Medical LLM and Retrieval Training (JMLR), which simultaneously trains the LLM and retriever during the fine-tuning stage to enhance the model's reasoning capabilities and utilization of domain knowledge. This approach incorporates the retrieved relevant documents into the input question, enabling the LLM to utilize this information when generating answers. JMLR introduces a unique mechanism called LLM-Rank loss to train the retriever, ensuring it prioritizes documents that significantly help the LLM generate correct answers. Experimental results demonstrate that the JMLR-13B model outperforms previous open-source and closed-source models, such as Meditron-70B and Llama2-13B using RAG, on multiple medical QA datasets. Moreover, JMLR-13B exhibits superiority over Claude3-Opus in reducing hallucination and improving reasoning quality, while requiring shorter training time and lower resource demands. In conclusion, this paper addresses the problem of enhancing LLM performance and reducing error generation in medical QA systems by improving the training methodology. JMLR achieves this goal by jointly training the retriever and LLM, resulting in improved accuracy and efficiency of the model.