Retrieval Augmented Thought Process for Private Data Handling in Healthcare

Thomas Pouplin,Hao Sun,Samuel Holt,Mihaela van der Schaar
2024-08-08
Abstract:Large Language Models (LLMs) have demonstrated the strong potential to assist both clinicians and the general public with their extensive medical knowledge. However, their application in healthcare is constrained due to concerns about the privacy of data used in training, which prevents the integration of private and personal information because of security and ethical issues. Moreover, if their capabilities can be enhanced with information retrieval to access up-to-date knowledge, the current integration of LLMs with Information retrieval lacks robustness to imperfect retrieval, which can hinder their effectiveness and even reduce overall performance. In this work, we address this challenge by introducing the Retrieval-Augmented Thought Process (RATP). Given access to external knowledge, RATP formulates the thought generation of LLMs as a multiple-step decision process. To optimise such a thought process, RATP leverages Monte-Carlo Tree Search and learns a proxy reward function that permits cost-efficient inference. On a private dataset of electronic medical records, deliberately excluded from any LLM training set, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.
Computation and Language,Artificial Intelligence,Information Retrieval,Machine Learning
What problem does this paper attempt to address?
The paper aims to address two major issues in the application of Large Language Models (LLMs) in the healthcare domain: 1. **Privacy Protection Issue**: The inclusion of sensitive information in training data may lead to privacy breaches, limiting the application of LLMs in healthcare scenarios. Medical institutions cannot integrate patients' Personally Identifiable Information (PII) into model training because once this information is learned by the model, its privacy cannot be guaranteed. 2. **Knowledge Update and Retrieval Issue**: Even though LLMs have a strong foundation of medical knowledge, they still face the problem of external knowledge updates. The existing methods of combining LLMs with Information Retrieval (IR) technology lack robustness against imperfect retrieval, which hinders the effectiveness and overall performance of LLMs. To address the above issues, the paper proposes a method called "Retrieval-Augmented Thought Process" (RATP). RATP views the process of LLM generating thoughts as a multi-step decision process and uses Monte-Carlo Tree Search (MCTS) to optimize this process. Additionally, RATP learns a proxy reward function to make the reasoning process more cost-efficient. The main contributions of the paper include: - Formalizing the open-ended question-answering task as a sequential decision problem and emphasizing the importance of the retrieval-augmented thought process in healthcare applications through collaboration with clinicians from different professional backgrounds. - Proposing the RATP method, which leverages MCTS to combine the reasoning capabilities of LLMs with external knowledge sources. - Conducting empirical evaluations on a private electronic medical record dataset, showing that RATP improves accuracy by 35% in question-answering tasks compared to context-only retrieval-augmented generation methods. In summary, the goal of the paper is to develop a technical solution that can effectively handle private data and provide more accurate and reliable services in the healthcare domain. By combining LLMs with external knowledge bases while ensuring data privacy, RATP aims to overcome the challenges faced by existing technologies.