Abstract:Background: The integration of artificial intelligence (AI) into medicine is growing, with some experts predicting its standalone use soon. However, skepticism remains due to limited positive outcomes from independent validations. This research evaluates AI software's effectiveness in analyzing chest X-rays (CXR) to identify lung nodules, a possible lung cancer indicator. Methods: This retrospective study analyzed 7,670,212 record pairs from radiological exams conducted between 2020 and 2022 during the Moscow Computer Vision Experiment, focusing on CXR and computed tomography (CT) scans. All images were acquired during clinical routine. The final dataset comprised 100 CXR images (50 with lung nodules, 50 without), selected consecutively and based on inclusion and exclusion criteria, to evaluate the performance of all five AI-based solutions, participating in the Moscow Computer Vision Experiment and analyzing CXR. The evaluation was performed in 3 stages. In the first stage, the probability of a nodule in the lung obtained from AI services was compared with the Ground Truth (1-there is a nodule, 0-there is no nodule). In the second stage, 3 radiologists evaluated the segmentation of nodules performed by the AI services (1-nodule correctly segmented, 0-nodule incorrectly segmented or not segmented at all). In the third stage, the same radiologists additionally evaluated the classification of the nodules (1-nodule correctly segmented and classified, 0-all other cases). The results obtained in stages 2 and 3 were compared with Ground Truth, which was common to all three stages. For each stage, diagnostic accuracy metrics were calculated for each AI service. Results: Three software solutions (Celsus, Lunit INSIGHT CXR, and qXR) demonstrated diagnostic metrics that matched or surpassed the vendor specifications, and achieved the highest area under the receiver operating characteristic curve (AUC) of 0.956 [95% confidence interval (CI): 0.918 to 0.994]. However, when evaluated by three radiologists for accurate nodule segmentation and classification, all solutions performed below the vendor-declared metrics, with the highest AUC reaching 0.812 (95% CI: 0.744 to 0.879). Meanwhile, all AI services demonstrated 100% specificity at stages 2 and 3 of the study. Conclusions: To ensure the reliability and applicability of AI-based software, it is crucial to validate performance metrics using high-quality datasets and engage radiologists in the evaluation process. Developers are recommended to improve the accuracy of the underlying models before allowing the standalone use of the software for lung nodule detection. The dataset created during the study may be accessed at https://mosmed.ai/datasets/mosmeddatargogksnalichiemiotsutstviemlegochnihuzlovtipvii/.

Effect of emphysema on AI software and human reader performance in lung nodule detection from low-dose chest CT

Evaluating the performance of artificial intelligence software for lung nodule detection on chest radiographs in a retrospective real-world UK population

Effect of Artificial Intelligence as a Second Reader on the Lung Nodule Detection and Localization Accuracy of Radiologists and Non-radiology Physicians in Chest Radiographs: A Multicenter Reader Study

AI-based software for lung nodule detection in chest X-rays -- Time for a second reader approach?

An AI deep learning algorithm for detecting pulmonary nodules on ultra-low-dose CT in an emergency setting: a reader study

Comparison of two AI-based software tools for detection of Incidental Pulmonary Nodules (IPN) in a University Hospital and a Radiology Practice

Performance of artificial intelligence-based software for the automatic detection of lung lesions on chest radiographs of patients with suspected lung cancer

Impact of artificial intelligence assistance on pulmonary nodule detection and localization in chest CT: a comparative study among radiologists of varying experience levels

Clinical outcomes and actual consequence of lung nodules incidentally detected on chest radiographs by artificial intelligence

Software using artificial intelligence for nodule and cancer detection in CT lung cancer screening: systematic review of test accuracy studies

Impact of AI-assisted CXR analysis in detecting incidental lung nodules and lung cancers in non-respiratory outpatient clinics

Navigating the Spectrum: Assessing the Concordance of ML-Based AI Findings with Radiology in Chest X-Rays in Clinical Settings

Artificial intelligence based on deep learning for differential diagnosis between benign and malignant pulmonary nodules: A real-world, multicenter, diagnostic study.

Independent evaluation of the accuracy of 5 artificial intelligence software for detecting lung nodules on chest X-rays

Deep Learning-based Artificial Intelligence Improves Accuracy of Error-prone Lung Nodules

Performance of AI for preoperative CT assessment of lung metastases: Retrospective analysis of 167 patients

Artificial Intelligence-Aided Diagnosis Software to Identify Highly Suspicious Pulmonary Nodules

Potential added value of an AI software with prediction of malignancy for the management of incidental lung nodules

Diagnostic Performance of Artificial Intelligence in Chest Radiographs Referred from the Emergency Department

Automated detection of lung nodules and coronary artery calcium using artificial intelligence on low-dose CT scans for lung cancer screening: accuracy and prognostic value

Evaluating artificial intelligence’s role in lung nodule diagnostics: A survey of radiologists in two pilot tertiary hospitals in China