Abstract:ImportanceArtificial intelligence (AI) can interpret abnormal signs in chest radiography (CXR) and generate captions, but a prospective study is needed to examine its practical value.ObjectiveTo prospectively compare natural language processing (NLP)-generated CXR captions and the diagnostic findings of radiologists.Design, Setting, and ParticipantsA multicenter diagnostic study was conducted. The training data set included CXR images and reports retrospectively collected from February 1, 2014, to February 28, 2018. The retrospective test data set included consecutive images and reports from April 1 to July 31, 2019. The prospective test data set included consecutive images and reports from May 1 to September 30, 2021.ExposuresA bidirectional encoder representation from a transformers model was used to extract language entities and relationships from unstructured CXR reports to establish 23 labels of abnormal signs to train convolutional neural networks. The participants in the prospective test group were randomly assigned to 1 of 3 different caption generation models: a normal template, NLP-generated captions, and rule-based captions based on convolutional neural networks. For each case, a resident drafted the report based on the randomly assigned captions and an experienced radiologist finalized the report blinded to the original captions. A total of 21 residents and 19 radiologists were involved.Main Outcomes and MeasuresTime to write reports based on different caption generation models.ResultsThe training data set consisted of 74 082 cases (39 254 [53.0%] women; mean [SD] age, 50.0 [17.1] years). In the retrospective (n = 8126; 4345 [53.5%] women; mean [SD] age, 47.9 [15.9] years) and prospective (n = 5091; 2416 [47.5%] women; mean [SD] age, 45.1 [15.6] years) test data sets, the mean (SD) area under the curve of abnormal signs was 0.87 (0.11) in the retrospective data set and 0.84 (0.09) in the prospective data set. The residents’ mean (SD) reporting time using the NLP-generated model was 283 (37) seconds—significantly shorter than the normal template (347 [58] seconds; P < .001) and the rule-based model (296 [46] seconds; P < .001). The NLP-generated captions showed the highest similarity to the final reports with a mean (SD) bilingual evaluation understudy score of 0.69 (0.24)—significantly higher than the normal template (0.37 [0.09]; P < .001) and the rule-based model (0.57 [0.19]; P < .001).Conclusions and RelevanceIn this diagnostic study of NLP-generated CXR captions, prior information provided by NLP was associated with greater efficiency in the reporting process, while maintaining good consistency with the findings of radiologists.

Development and Multicenter Validation of Chest X-ray Radiography Interpretations Based on Natural Language Processing.

Development of a multipotent diagnostic tool for chest X-rays by multi-object detection method

Development and External Validation of an Artificial Intelligence-Based Method for Scalable Chest Radiograph Diagnosis: A Multi-Country Cross-Sectional Study

Supervised and unsupervised language modelling in Chest X-Ray radiological reports

Automated Radiological Impression Generation For Plain Chest X-Rays With End To End Deep Learning

Automated Abnormality Classification of Chest Radiographs Using Deep Convolutional Neural Networks

Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning

Using artificial intelligence for chest radiograph interpretation: a retrospective multi-reader-multi-case (MRMC) study of the automatic detection of multiple abnormalities and generation of diagnostic report system

Can Artificial Intelligence Reliably Report Chest X-Rays?: Radiologist Validation of an Algorithm trained on 2.3 Million X-Rays

Learning to Read Chest X-Ray Images from 16000+ Examples Using CNN

Utilization of Deep Convolutional Neural Networks for Accurate Chest X-Ray Diagnosis and Disease Detection.

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Can AI generate diagnostic reports for radiologist approval on CXR images? A multi-reader and multi-case observer performance study

Comparison of Chest Radiograph Captions Based on Natural Language Processing Vs Completed by Radiologists

Validation of a Deep Learning Chest X-ray Interpretation Model: Integrating Large-Scale AI and Large Language Models for Comparative Analysis with ChatGPT

Automated pneumothorax triaging in chest X‐rays in the New Zealand population using deep‐learning algorithms

Designing a computer-assisted diagnosis system for cardiomegaly detection and radiology report generation

Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks

Automated Chest Radiographs Triage Reading by a Deep Learning Referee Network

Deep Learning for Chest X-ray Diagnosis: Competition Between Radiologists with or Without Artificial Intelligence Assistance

CXR-Net: An Encoder-Decoder-Encoder Multitask Deep Neural Network for Explainable and Accurate Diagnosis of COVID-19 pneumonia with Chest X-ray Images