Real-world testing of an artificial intelligence algorithm for the analysis of chest X-rays in primary care settings

Queralt Miró Catalina,Josep Vidal-Alaball,Aïna Fuster-Casanovas,Anna Escalé-Besa,Anna Ruiz Comellas,Jordi Solé-Casals
DOI: https://doi.org/10.1038/s41598-024-55792-1
IF: 4.6
2024-03-04
Scientific Reports
Abstract:Interpreting chest X-rays is a complex task, and artificial intelligence algorithms for this purpose are currently being developed. It is important to perform external validations of these algorithms in order to implement them. This study therefore aims to externally validate an AI algorithm's diagnoses in real clinical practice, comparing them to a radiologist's diagnoses. The aim is also to identify diagnoses the algorithm may not have been trained for. A prospective observational study for the external validation of the AI algorithm in a region of Catalonia, comparing the AI algorithm's diagnosis with that of the reference radiologist, considered the gold standard. The external validation was performed with a sample of 278 images and reports, 51.8% of which showed no radiological abnormalities according to the radiologist's report. Analysing the validity of the AI algorithm, the average accuracy was 0.95 (95% CI 0.92; 0.98), the sensitivity was 0.48 (95% CI 0.30; 0.66) and the specificity was 0.98 (95% CI 0.97; 0.99). The conditions where the algorithm was most sensitive were external, upper abdominal and cardiac and/or valvular implants. On the other hand, the conditions where the algorithm was less sensitive were in the mediastinum, vessels and bone. The algorithm has been validated in the primary care setting and has proven to be useful when identifying images with or without conditions. However, in order to be a valuable tool to help and support experts, it requires additional real-world training to enhance its diagnostic capabilities for some of the conditions analysed. Our study emphasizes the need for continuous improvement to ensure the algorithm's effectiveness in primary care.
multidisciplinary sciences
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to externally validate the diagnostic ability of an artificial intelligence algorithm for analyzing chest X - rays in a primary care setting and compare it with the diagnoses made by radiologists. Specifically, the main objectives of the study include: 1. **External validation**: Validate the diagnostic accuracy of the artificial intelligence algorithm through data in actual clinical practice to ensure its effectiveness and reliability in different populations. 2. **Identify untrained diagnoses**: Determine the diagnostic conditions that the algorithm may not have been trained to recognize in order to further improve the algorithm. 3. **Evaluate algorithm performance**: Calculate the overall accuracy, sensitivity, and specificity of the algorithm and analyze its diagnostic performance in different anatomical regions. ### Research background Chest X - rays are one of the most commonly used examination methods in medical imaging for detecting lung and cardiovascular diseases. However, due to the shortage of radiologists and the increase in workload, the misdiagnosis rate has risen. Therefore, it is particularly important to introduce artificial intelligence tools to assist in diagnosis. Although existing computer - aided diagnosis (CAD) systems have been applied to a certain extent, the development of deep - learning and machine - learning models has provided higher accuracy and multi - condition detection capabilities. ### Research methods - **Research design**: This is a prospective observational study conducted in a primary care center in Catalonia. - **Sample selection**: The final sample included 278 chest X - rays and their reports, of which 51.8% of the images had no radiological abnormalities according to the radiologists' reports. - **Data collection**: The research team input the images into the AI algorithm and compared them with the radiologists' diagnosis results. - **Statistical analysis**: Statistical analysis was performed using R software version 4.2.1, and the accuracy, sensitivity, and specificity of the algorithm were calculated. ### Main findings - **Overall performance**: - The average accuracy was 0.95 (95% CI 0.92; 0.98) - The average sensitivity was 0.48 (95% CI 0.30; 0.66) - The average specificity was 0.98 (95% CI 0.97; 0.99) - **Performance for specific conditions**: - The algorithm showed high sensitivity in detecting external implants, upper - abdominal conditions, and heart and valve conditions. - In detecting mediastinal, vascular, and bone conditions, the algorithm had lower sensitivity. - **Untrained diagnoses**: - Radiologists identified some conditions that the algorithm had not been trained to recognize, such as bronchial wall thickening, fibrotic lesions, and chronic lung abnormalities. ### Conclusion This study emphasizes the importance of external validation in the actual clinical environment to ensure the effectiveness and safety of artificial intelligence algorithms. Although the algorithm performs well under certain conditions, it still needs further training and improvement under other conditions. In addition, the study also points out some diagnostic conditions that have not been recognized by the algorithm training, providing a direction for further optimizing the algorithm.