Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction

Reza Khanmohammadi,Ahmed I. Ghanem,Kyle Verdecchia,Ryan Hall,Mohamed Elshaikh,Benjamin Movsas,Hassan Bagher-Ebadian,Bing Luo,Indrin J. Chetty,Tuka Alhanai,Kundan Thind,Mohammad M. Ghassemi
2024-08-09
Abstract:Large Language Models (LLMs) offer significant potential for clinical symptom extraction, but their deployment in healthcare settings is constrained by privacy concerns, computational limitations, and operational costs. This study investigates the optimization of compact LLMs for cancer toxicity symptom extraction using a novel iterative refinement approach. We employ a student-teacher architecture, utilizing Zephyr-7b-beta and Phi3-mini-128 as student models and GPT-4o as the teacher, to dynamically select between prompt refinement, Retrieval-Augmented Generation (RAG), and fine-tuning strategies. Our experiments on 294 clinical notes covering 12 post-radiotherapy toxicity symptoms demonstrate the effectiveness of this approach. The RAG method proved most efficient, improving average accuracy scores from 0.32 to 0.73 for Zephyr-7b-beta and from 0.40 to 0.87 for Phi3-mini-128 during refinement. In the test set, both models showed an approximate 0.20 increase in accuracy across symptoms. Notably, this improvement was achieved at a cost 45 times lower than GPT-4o for Zephyr and 79 times lower for Phi-3. These results highlight the potential of iterative refinement techniques in enhancing the capabilities of compact LLMs for clinical applications, offering a balance between performance, cost-effectiveness, and privacy preservation in healthcare settings.
Computation and Language,Information Retrieval
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Optimizing the application of small LLMs (Large Language Models) in clinical symptom extraction**: The paper explores how iterative optimization techniques can enhance the ability of small LLMs to extract cancer toxicity symptoms from clinical notes. The study particularly focuses on balancing performance, cost-effectiveness, and data privacy in resource-limited and high-privacy-demanding healthcare environments. 2. **Exploring the effectiveness of a hybrid student-teacher architecture**: A student-teacher architecture combining prompt engineering, Retrieval-Augmented Generation (RAG), and fine-tuning strategies is proposed to dynamically select the best optimization method. This approach can intelligently adjust optimization strategies based on the performance and needs of the student model. 3. **Evaluating the cost-effectiveness of different optimization techniques**: The paper compares the effects of prompt engineering, RAG, fine-tuning, and hybrid methods in improving model performance and analyzes the economic costs of each method to find the most suitable solution for clinical applications. Through these studies, the paper demonstrates the potential of iterative optimization techniques in enhancing the capability of small LLMs to process clinical data, particularly in the task of extracting toxicity symptoms after radiotherapy.