Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report

YuHe Ke,Liyuan Jin,Kabilan Elangovan,Hairil Rizal Abdullah,Nan Liu,Alex Tiong Heng Sia,Chai Rick Soh,Joshua Yi Min Tung,Jasmine Chiat Ling Ong,Daniel Shu Wei Ting
2024-01-29
Abstract:Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue that in medical applications, large language models (LLMs) often find it difficult to incorporate current, guideline-based clinical knowledge during actual implementation, especially in clinical specialties and tasks. Additionally, traditional methods to improve accuracy, such as fine-tuning, present significant computational challenges. To address these issues, the paper proposes and evaluates a Retrieval Augmented Generation (RAG) method, specifically for the application of preoperative medicine. Specifically, the paper develops an LLM-RAG pipeline and evaluates its accuracy and safety, with a focus on preoperative medicine. The primary endpoint of the study is to assess the accuracy and safety of the answers generated by the LLM-RAG system. By comparing the answers generated by human doctors, the paper demonstrates that the RAG-enhanced LLM can achieve non-inferior performance to humans in handling complex preoperative guidance and reduces the rate of hallucinations. This indicates that the LLM-RAG model has significant advantages in the deployment within the medical field, including fact-based knowledge, scalability, and extensibility.