CheXphotogenic: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays

Pranav Rajpurkar,Anirudh Joshi,Anuj Pareek,Jeremy Irvin,Andrew Y. Ng,Matthew Lungren
DOI: https://doi.org/10.48550/arXiv.2011.06129
2020-11-12
Abstract:The use of smartphones to take photographs of chest x-rays represents an appealing solution for scaled deployment of deep learning models for chest x-ray interpretation. However, the performance of chest x-ray algorithms on photos of chest x-rays has not been thoroughly investigated. In this study, we measured the diagnostic performance for 8 different chest x-ray models when applied to photos of chest x-rays. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to smartphone photos of x-rays in the CheXphoto dataset without further tuning. We found that several models had a drop in performance when applied to photos of chest x-rays, but even with this drop, some models still performed comparably to radiologists. Further investigation could be directed towards understanding how different model training procedures may affect model generalization to photos of chest x-rays.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are to evaluate the diagnostic performance of deep - learning models on chest X - ray photos taken by smart phones and how the generalization ability of these models compares with that of the original digital chest X - rays. Specifically, the researchers are concerned with: 1. **The problem of model performance degradation**: When deep - learning models are applied to chest X - ray photos taken by smart phones, will their performance decline significantly? In the study, 8 different chest X - ray models were used, which were all developed by different research teams and participated in the CheXpert challenge. 2. **Comparison with radiologists**: Even if the performance has declined, can the performance of these models still be comparable to that of radiologists? The study evaluates this by comparing the performance of the models on the photos with the diagnostic results of radiologists. 3. **Differences in performance under different pathological conditions**: The study also explores the specific situation of performance degradation of these models when processing photos under different pathological conditions (such as atelectasis, cardiomegaly, consolidation, edema and pleural effusion). Through the research on these problems, the author hopes to understand the feasibility and limitations of the application of deep - learning models in the actual clinical environment, especially the possibility of using smart phones for medical image transmission and automatic interpretation in areas with limited resources.