Leveraging Prompt-Learning for Structured Information Extraction from Crohn's Disease Radiology Reports in a Low-Resource Language

Liam Hazan,Gili Focht,Naama Gavrielov,Roi Reichart,Talar Hagopian,Mary-Louise C. Greer,Ruth Cytter Kuint,Dan Turner,Moti Freiman
2024-05-22
Abstract:Automatic conversion of free-text radiology reports into structured data using Natural Language Processing (NLP) techniques is crucial for analyzing diseases on a large scale. While effective for tasks in widely spoken languages like English, generative large language models (LLMs) typically underperform with less common languages and can pose potential risks to patient privacy. Fine-tuning local NLP models is hindered by the skewed nature of real-world medical datasets, where rare findings represent a significant data imbalance. We introduce SMP-BERT, a novel prompt learning method that leverages the structured nature of reports to overcome these challenges. In our studies involving a substantial collection of Crohn's disease radiology reports in Hebrew (over 8,000 patients and 10,000 reports), SMP-BERT greatly surpassed traditional fine-tuning methods in performance, notably in detecting infrequent conditions (AUC: 0.99 vs 0.94, F1: 0.84 vs 0.34). SMP-BERT empowers more accurate AI diagnostics available for low-resource languages.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper mainly discusses how to extract structured information from radiological reports of Crohn's Disease, especially in resource-limited language environments such as Hebrew. Current methods, such as Large Language Models (LLMs), perform poorly in handling less widely used languages and may invade patient privacy. The paper presents a new prompt learning method called SMP-BERT (Section Matching Prediction BERT), which utilizes the structured characteristics of radiological reports to overcome the challenges of data imbalance and low-resource languages. SMP-BERT employs the pre-training task SMP to learn the matching relationship between the "Findings" and "Impression" sections, enabling the model to make inferences in a zero-shot setting and fine-tune with relatively small amounts of labeled data. This approach demonstrates significant performance improvements in detecting rare conditions compared to traditional fine-tuning methods, with improved AUC and F1 scores in detecting certain symptoms. The paper showcases the advantages of SMP-BERT in Hebrew radiological reports through comparative experiments, particularly in improving the accuracy of AI diagnosis in situations with limited data, imbalanced categories, and low-resource languages. This lays the foundation for developing more efficient information extraction methods in other resource-limited language environments.