From Revisions to Insights: Converting Radiology Report Revisions into Actionable Educational Feedback Using Generative AI Models

Shawn Lyo,Suyash Mohan,Alvand Hassankhani,Abass Noor,Farouk Dako,Tessa Cook
DOI: https://doi.org/10.1007/s10278-024-01233-4
2024-08-19
Abstract:Expert feedback on trainees' preliminary reports is crucial for radiologic training, but real-time feedback can be challenging due to non-contemporaneous, remote reading and increasing imaging volumes. Trainee report revisions contain valuable educational feedback, but synthesizing data from raw revisions is challenging. Generative AI models can potentially analyze these revisions and provide structured, actionable feedback. This study used the OpenAI GPT-4 Turbo API to analyze paired synthesized and open-source analogs of preliminary and finalized reports, identify discrepancies, categorize their severity and type, and suggest review topics. Expert radiologists reviewed the output by grading discrepancies, evaluating the severity and category accuracy, and suggested review topic relevance. The reproducibility of discrepancy detection and maximal discrepancy severity was also examined. The model exhibited high sensitivity, detecting significantly more discrepancies than radiologists (W = 19.0, p < 0.001) with a strong positive correlation (r = 0.778, p < 0.001). Interrater reliability for severity and type were fair (Fleiss' kappa = 0.346 and 0.340, respectively; weighted kappa = 0.622 for severity). The LLM achieved a weighted F1 score of 0.66 for severity and 0.64 for type. Generated teaching points were considered relevant in ~ 85% of cases, and relevance correlated with the maximal discrepancy severity (Spearman ρ = 0.76, p < 0.001). The reproducibility was moderate to good (ICC (2,1) = 0.690) for the number of discrepancies and substantial for maximal discrepancy severity (Fleiss' kappa = 0.718; weighted kappa = 0.94). Generative AI models can effectively identify discrepancies in report revisions and generate relevant educational feedback, offering promise for enhancing radiology training.
What problem does this paper attempt to address?