Nimg-63. Leveraging Llms For Accurate Differentiation Of Radiation Necrosis And Tumor Progression In Brain Mri Reports: A Study On Automated Scoring And Clinical Implications

Eulanca Liu,Shir Goldfinger,William Delery,Erika Jank,James Lamb,Stephen Tenn,John Hegde,Tania Kaprealian,Michael Steinberg,Ricky Savjani
DOI: https://doi.org/10.1093/neuonc/noae165.0827
2024-11-29
Neuro-Oncology
Abstract:BACKGROUNDAccurate differentiation between radiation necrosis and tumor progression or recurrence in patients treated with stereotactic radiosurgery for brain metastases is critical for guiding clinical management. This study leverages the advanced natural language processing capabilities of Meta Llama3, an artificial intelligence (AI) large language model (LLM), combined with prompt engineering, to rapidly categorize brain magnetic resonance imaging (MRI) radiology reports. Our objective was to develop an automated scoring system to classify concern for radiation necrosis, tumor progression, equivocal findings, or stable exams. METHODSUsing a comprehensive dataset of reports annotated by expert radiologists, we ran inference on a 70-billion parameter Llama3 model (temperature 0.2, top_p 0.9) with specific prompts designed to capture the nuanced language and diagnostic criteria related to radiation necrosis or tumor progression. RESULTSThe first pass was performed on a training dataset of 107 reports and did not predefine the clinical conditions. This demonstrated 43.4% accuracy in scoring when compared to a human user's classification of each report. Agreement between the human reader and Llama3 was assessed using the Gwet agreement coefficient, AC1=0.411 (99%CI 0.396-0.426). Multiple iterations of targeted prompt engineering were then employed to narrow the definition of radiation necrosis and tumor progression, with specific examples and nuanced language used to achieve a higher degree of accuracy at 72.0%, AC1=0.719 (99%CI 0.717-0.722). DISCUSSION/FUTURE DIRECTIONSThis surpasses recent demonstration of lower human-LLM agreement in radiographic score assignment, with further room for calibration on a dataset of several thousand reports. Next, we will correlate automated interpretations with actual clinical management decisions for radiation necrosis (e.g., initiation of steroids, bevacizumab, Laser Interstitial Thermal Therapy, and/or repeat imaging). This automated scoring system holds significant potential for LLMs in clinical applications. Future work will focus on integrating this model into clinical workflows and expanding its capabilities to include longitudinal monitoring of patient outcomes.
oncology,clinical neurology
What problem does this paper attempt to address?