Abstract:Abstract Summary Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Similarly, Self-BioRAG outperforms RAG by 8% Rouge-1 score in generating more proficient answers on two long-form question-answering benchmarks on average. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains. Availability and implementation Self-BioRAG is available at https://github.com/dmis-lab/self-biorag.

RadioRAG: Factual large language models for enhanced diagnostics in radiology using online retrieval augmented generation

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model

Rationale-Guided Retrieval Augmented Generation for Medical Question Answering

oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

Large language models (LLMs) in radiology exams for medical students: Performance and consequences

Retrieval-augmented large language models for clinical trial screening.

Radiology-GPT: A Large Language Model for Radiology

LLM-RG4: Flexible and Factual Radiology Report Generation across Diverse Input Contexts

Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models

KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models

LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation

Performance of Open-Source LLMs in Challenging Radiological Cases — A Benchmark Study on 1,933 Eurorad Case Reports

Application of NotebookLM, a Large Language Model with Retrieval-Augmented Generation, for Lung Cancer Staging

Radiology-Llama2: Best-in-Class Large Language Model for Radiology

Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Advancing Question-Answering in Ophthalmology with Retrieval Augmented Generations (RAG): Benchmarking Open-source and Proprietary Large Language Models