Role of Natural Language Processing in Automatic Detection of Unexpected Findings in Radiology Reports: A Comparative Study of RoBERTa, CNN, and ChatGPT

Pilar López-Úbeda,Teodoro Martín-Noguerol,Jorge Escartín,Antonio Luna
DOI: https://doi.org/10.1016/j.acra.2024.07.057
2024-08-08
Abstract:Rationale and objectives: Large Language Models can capture the context of radiological reports, offering high accuracy in detecting unexpected findings. We aim to fine-tune a Robustly Optimized BERT Pretraining Approach (RoBERTa) model for the automatic detection of unexpected findings in radiology reports to assist radiologists in this relevant task. Second, we compared the performance of RoBERTa with classical convolutional neural network (CNN) and with GPT4 for this goal. Materials and methods: For this study, a dataset consisting of 44,631 radiological reports for training and 5293 for the initial test set was used. A smaller subset comprising 100 reports was utilized for the comparative test set. The complete dataset was obtained from our institution's Radiology Information System, including reports from various dates, examinations, genders, ages, etc. For the study's methodology, we evaluated two Large Language Models, specifically performing fine-tuning on RoBERTa and developing a prompt for ChatGPT. Furthermore, extending previous studies, we included a CNN in our comparison. Results: The results indicate an accuracy of 86.15% in the initial test set using the RoBERTa model. Regarding the comparative test set, RoBERTa achieves an accuracy of 79%, ChatGPT 64%, and the CNN 49%. Notably, RoBERTa outperforms the other systems by 30% and 15%, respectively. Conclusion: Fine-tuned RoBERTa model can accurately detect unexpected findings in radiology reports outperforming the capability of CNN and ChatGPT for this task.
What problem does this paper attempt to address?