Improved precision oncology question-answering using agentic LLM

Rangan Das,K Maheswari,Shaheen Siddiqui,Nikita Arora,Ankush Paul,Jeet Nanshi,Varun Udbalkar,Apoorva Sarvade,Harsha Chaturvedi,Tammy Shvartsman,Shet Masih,R Thippeswamy,Shekar Patil,S S Nirni,Brian Garsson,Sanghamitra Bandyopadhyay,Ujjwal Maulik,Mohammed Farooq,Debarka Sengupta
DOI: https://doi.org/10.1101/2024.09.20.24314076
2024-10-05
Abstract:The clinical adoption of Large Language Models (LLMs) in biomedical research has been limited by concerns regarding the quality, accuracy, and reliability of their outputs, particularly in precision oncology, where clinical decision-making demands high precision. Current models, often based on fine-tuned foundational LLMs, are prone to issues such as hallucinations, incoherent reasoning, and loss of context. In this work, we present GeneSilico Copilot, an advanced agent-based architecture that transforms LLMs from simple response synthesizers to clinical reasoning systems. Our approach is centred around a bespoke ReAct agent that orchestrates a suite of specialized tools for asynchronous information retrieval and synthesis. These tools access curated document vector stores containing clinical treatment guidelines, genomic insights, drug information, clinical trials, and breast cancer-specific literature. To leverage large context windows of current LLMs, we implement a hybrid search strategy that prioritizes key information and dynamically integrates summarized content, reducing context fragmentation. Incorporating additional metadata further allows for precise, transparent and evidence-backed reasoning at each step of the thought process. The system ensures that at every stage, the agent can synthesize meaningful, context-aware observations that contribute to a coherent and comprehensive final response that aligns with clinical standards. Evaluations on real-world breast cancer cases show that GeneSilico Copilot significantly improves response accuracy and personalization. This system represents a critical advancement toward making LLMs clinically deployable in precision oncology and has potential applications in broader medical domains requiring complex, data-driven decision-making.
Oncology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of quality and reliability in the clinical application of large - language models (LLMs) in precision oncology, especially in the clinical decision - making processes that require high precision. Specifically, the current models based on fine - tuning the base LLMs have the following problems: 1. **Hallucinations**: Generate information that does not conform to the facts. 2. **Incoherent Reasoning**: The reasoning process provided lacks logic. 3. **Loss of Context**: When dealing with long texts, it is easy to lose important information. These problems limit the practical application of LLMs in precision oncology. To overcome these challenges, the paper proposes GeneSilico Copilot (GSCP), an agent - based architecture aimed at transforming LLMs from simple response synthesizers into clinical reasoning systems. ### Main Contributions 1. **Custom - made ReAct Agents**: Through custom - made ReAct agents, GSCP can coordinate multiple professional tools for asynchronous information retrieval and synthesis. 2. **Hybrid Search Strategy**: Utilize the large context window of large LLMs to implement a hybrid search strategy, prioritize key information, and dynamically integrate summary content to reduce context fragmentation. 3. **Metadata Integration**: By integrating additional metadata, achieve precise, transparent, and evidence - supported reasoning in each step of the thinking process. 4. **Multi - tool Architecture**: The multi - tool architecture allows the system to perform complex reasoning, synthesize insights from the hybrid retrieval mechanism, and dynamically filter out irrelevant information. ### Experimental Results The paper verifies the effectiveness of GSCP through multiple evaluation methods, including: - **Objective Question Answering**: Using metrics such as accuracy, precision, recall, and F1 - score, GSCP significantly outperforms the basic RAG system and independent LLMs in all metrics. - **Subjective Question Answering**: Use the DeepEval framework to evaluate generation and retrieval performance. GSCP performs well in terms of context precision, relevance, fidelity, and answer relevance. - **Custom - made Datasets**: For custom - made datasets in precision oncology and breast cancer genetics, GSCP significantly outperforms the basic system in terms of context precision and relevance. ### Conclusion GSCP represents an important progress in applying LLMs to precision oncology, improves the accuracy and personalization of responses, and has broad application prospects, especially in the medical field that requires complex, data - driven decision - making.