Abstract:Large Language Models (LLMs) have demonstrated the strong potential to assist both clinicians and the general public with their extensive medical knowledge. However, their application in healthcare is constrained due to concerns about the privacy of data used in training, which prevents the integration of private and personal information because of security and ethical issues. Moreover, if their capabilities can be enhanced with information retrieval to access up-to-date knowledge, the current integration of LLMs with Information retrieval lacks robustness to imperfect retrieval, which can hinder their effectiveness and even reduce overall performance. In this work, we address this challenge by introducing the Retrieval-Augmented Thought Process (RATP). Given access to external knowledge, RATP formulates the thought generation of LLMs as a multiple-step decision process. To optimise such a thought process, RATP leverages Monte-Carlo Tree Search and learns a proxy reward function that permits cost-efficient inference. On a private dataset of electronic medical records, deliberately excluded from any LLM training set, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.

What problem does this paper attempt to address?

The paper aims to address two major issues in the application of Large Language Models (LLMs) in the healthcare domain: 1. **Privacy Protection Issue**: The inclusion of sensitive information in training data may lead to privacy breaches, limiting the application of LLMs in healthcare scenarios. Medical institutions cannot integrate patients' Personally Identifiable Information (PII) into model training because once this information is learned by the model, its privacy cannot be guaranteed. 2. **Knowledge Update and Retrieval Issue**: Even though LLMs have a strong foundation of medical knowledge, they still face the problem of external knowledge updates. The existing methods of combining LLMs with Information Retrieval (IR) technology lack robustness against imperfect retrieval, which hinders the effectiveness and overall performance of LLMs. To address the above issues, the paper proposes a method called "Retrieval-Augmented Thought Process" (RATP). RATP views the process of LLM generating thoughts as a multi-step decision process and uses Monte-Carlo Tree Search (MCTS) to optimize this process. Additionally, RATP learns a proxy reward function to make the reasoning process more cost-efficient. The main contributions of the paper include: - Formalizing the open-ended question-answering task as a sequential decision problem and emphasizing the importance of the retrieval-augmented thought process in healthcare applications through collaboration with clinicians from different professional backgrounds. - Proposing the RATP method, which leverages MCTS to combine the reasoning capabilities of LLMs with external knowledge sources. - Conducting empirical evaluations on a private electronic medical record dataset, showing that RATP improves accuracy by 35% in question-answering tasks compared to context-only retrieval-augmented generation methods. In summary, the goal of the paper is to develop a technical solution that can effectively handle private data and provide more accurate and reliable services in the healthcare domain. By combining LLMs with external knowledge bases while ensuring data privacy, RATP aims to overcome the challenges faced by existing technologies.

Retrieval Augmented Thought Process for Private Data Handling in Healthcare

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models

SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

Improving Retrieval Augmented Language Model with Self-Reasoning

JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability

Rethinking with Retrieval: Faithful Large Language Model Inference

Onco-Retriever: Generative Classifier for Retrieval of EHR Records in Oncology

Medical, moral and legal aspects of renal replacement therapy.

Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT

RRAML: Reinforced Retrieval Augmented Machine Learning

Augmented non-hallucinating large language models as medical information curators

Development of a privacy preserving large language model for automated data extraction from thyroid cancer pathology reports

Robust Privacy Amidst Innovation with Large Language Models Through a Critical Assessment of the Risks

Boosting Healthcare LLMs Through Retrieved Context

Epilogue. Establishment of the Center for Chronic Disease Prevention and Health Promotion.