Abstract:BACKGROUND: Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports.OBJECTIVE: This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center.METHODS: Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing.RESULTS: Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively.CONCLUSIONS: We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.

The implementation of natural language processing to extract index lesions from breast magnetic resonance imaging reports

Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing.

Application of natural language processing to post-structuring of rectal cancer MRI reports

EXTRACTING BI-RADS FEATURES FROM MAMMOGRAPHY REPORTS IN CHINESE BASED ON MACHINE LEARNING

Evaluating the accuracy of lung-RADS score extraction from radiology reports: Manual entry versus natural language processing

Development and Validation of a Natural Language Processing Algorithm for Extracting Clinical and Pathological Features of Breast Cancer From Pathology Reports

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports

Application of MRI Breast Imaging Reporting and Data System in Breast Lesions

Quantitative Analysis of Lesion Morphology and Texture Features for Diagnostic Prediction in Breast MRI

Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system

Natural language processing for automated breast cancer recurrence detection and classification in computed tomography reports.

Value of breast MRI omics features and clinical characteristics in Breast Imaging Reporting and Data System (BI-RADS) category 4 breast lesions: an analysis of radiomics-based diagnosis

Natural Language Processing of Radiology Reports to Detect Complications of Ischemic Stroke

Nimg-63. Leveraging Llms For Accurate Differentiation Of Radiation Necrosis And Tumor Progression In Brain Mri Reports: A Study On Automated Scoring And Clinical Implications

Harnessing Large Language Models for Structured Reporting in Breast Ultrasound: A Comparative Study of Open AI (GPT-4.0) and Microsoft Bing (GPT-4)

A Preliminary Study of Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports Using Natural Language Processing

An artificial intelligence system using maximum intensity projection MR images facilitates classification of non-mass enhancement breast lesions

Extracting Pulmonary Nodules and Nodule Characteristics from Radiology Reports of Lung Cancer Screening Patients Using Transformer Models

ARTIFICIAL INTELLIGENCE: NATURAL LANGUAGE PROCESSING FOR PEER-REVIEW IN RADIOLOGY

Using GPT‐4 for LI‐RADS feature extraction and categorization with multilingual free‐text reports