Abstract:Background: Community-acquired Pneumonia (CAP) is a common childhood infectious disease. Deep learning models show promise in X-ray interpretation and diagnosis, but their validation should be extended due to limitations in the current validation workflow. To extend the standard validation workflow we propose doing a pilot test with the next characteristics. First, the assumption of perfect ground truth (100% sensitive and specific) is unrealistic, as high intra and inter-observer variability have been reported. To address this, we propose using Bayesian latent class models (BLCA) to estimate accuracy during the pilot. Additionally, assessing only the performance of a model without considering its applicability and acceptance by physicians is insufficient if we hope to integrate AI systems into day-to-day clinical practice. Therefore, we propose employing explainable artificial intelligence (XAI) methods during the pilot test to involve physicians and evaluate how well a Deep Learning model is accepted and how helpful it is for routine decisions as well as analyze its limitations by assessing the etiology. This study aims to apply the proposed pilot to test a deep Convolutional Neural Network (CNN)-based model for identifying consolidation in pediatric chest-X-ray (CXR) images already validated using the standard workflow. Methods: For the standard validation workflow, a total of 5856 public CXRs and 950 private CXRs were used to train and validate the performance of the CNN model. The performance of the model was estimated assuming a perfect ground truth. For the pilot test proposed in this article, a total of 190 pediatric chest-X-ray (CXRs) images were used to test the CNN model support decision tool (SDT). The performance of the model on the pilot test was estimated using extensions of the two-test Bayesian Latent-Class model (BLCA). The sensitivity, specificity, and accuracy of the model were also assessed. The clinical characteristics of the patients were compared according to the model performance. The adequacy and applicability of the SDT was tested using XAI techniques. The adequacy of the SDT was assessed by asking two senior physicians the agreement rate with the SDT. The applicability was tested by asking three medical residents before and after using the SDT and the agreement between experts was calculated using the kappa index. Results: The CRXs of the pilot test were labeled by the panel of experts into consolidation (124/176, 70.4%) and no-consolidation/other infiltrates (52/176, 29.5%). A total of 31/176 (17.6%) discrepancies were found between the model and the panel of experts with a kappa index of 0.6. The sensitivity and specificity reached a median of 90.9 (95% Credible Interval (CrI), 81.2-99.9) and 77.7 (95% CrI, 63.3-98.1), respectively. The senior physicians reported a high agreement rate (70%) with the system in identifying logical consolidation patterns. The three medical residents reached a higher agreement using SDT than alone with experts (0.66±0.1 vs. 0.75±0.2). Conclusions: Through the pilot test, we have successfully verified that the deep learning model was underestimated when a perfect ground truth was considered. Furthermore, by conducting adequacy and applicability tests, we can ensure that the model is able to identify logical patterns within the CXRs and that augmenting clinicians with automated preliminary read assistants could accelerate their workflows and enhance accuracy in identifying consolidation in pediatric CXR images.

Detection of Pneumonia in Children through Chest Radiographs using Artificial Intelligence in a Low-Resource Setting: A Pilot Study

Diagnosis of common pulmonary diseases in children by X-ray images and deep learning

The Use of Chest Radiographs and Machine Learning Model for the Rapid Detection of Pneumonitis in Pediatric

A machine learning approach on chest X-rays for pediatric pneumonia detection

Study protocol and design for the assessment of paediatric pneumonia from X-ray images using deep learning

Development and testing of a deep learning algorithm to detect lung consolidation among children with pneumonia using hand-held ultrasound

Validating the accuracy of deep learning for the diagnosis of pneumonia on chest x-ray against a robust multimodal reference diagnosis: a post hoc analysis of two prospective studies

Using Deep Learning to Improve Early Diagnosis of Pneumonia in Underdeveloped Countries

Deep Learning Assistance for Tuberculosis Diagnosis with Chest Radiography in Low-Resource Settings.

Deep Neural Network Augments Performance of Junior Residents in Diagnosing COVID-19 Pneumonia on Chest Radiographs

Development and External Validation of an Artificial Intelligence-Based Method for Scalable Chest Radiograph Diagnosis: A Multi-Country Cross-Sectional Study

Enhancing Early Lung Cancer Detection on Chest Radiographs with AI-assistance: A Multi-Reader Study

CovidAID: COVID-19 Detection Using Chest X-Ray

A Prospective Observational Study to Investigate Performance of a Chest X-ray Artificial Intelligence Diagnostic Support Tool Across 12 U.S. Hospitals

AI-enabled case detection model for infectious disease outbreaks in resource-limited settings

Artificial Intelligence in Paediatric Tuberculosis

An automated COVID-19 triage pipeline using artificial intelligence based on chest radiographs and clinical data

Testing the performance, adequacy, and applicability of an artificial intelligence model for pediatric pneumonia diagnosis

Development of a new prognostic model to predict pneumonia outcome using artificial intelligence-based chest radiograph results

Artificial Intelligence Assisting the Early Detection of Active Pulmonary Tuberculosis From Chest X-Rays: A Population-Based Study

Nonradiology Health Care Professionals Significantly Benefit From AI Assistance in Emergency-Related Chest Radiography Interpretation