Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability

Özgür Acar Güler,Manuel Günther,André Anjos
2024-10-08
Abstract:Automatic classification of active tuberculosis from chest X-ray images has the potential to save lives, especially in low- and mid-income countries where skilled human experts can be scarce. Given the lack of available labeled data to train such systems and the unbalanced nature of publicly available datasets, we argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data. One way of evaluating the reliability of such systems is to ensure that models use the same regions of input images for predictions as medical experts would. In this paper, we show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), a technique to balance different classes during pre-training and fine-tuning, can improve the alignment of decision foundations between models and experts, as compared to a model directly trained on the target dataset. At the same time, these approaches keep perfect classification accuracy according to the area under the receiver operating characteristic curve (AUROC) on the test set, and improve generalization on an independent, unseen dataset. For the purpose of reproducibility, our source code is made available online.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when automatically classifying active tuberculosis (TB) in chest X - ray (CXR) images, how to improve the reliability and interpretability of deep neural network (DNN) models, especially in the case of data imbalance and scarce labeled data. ### Specific background and challenges of the problem include: 1. **Scarce and Imbalanced Data**: - The publicly available labeled datasets are limited, and these datasets are usually imbalanced (i.e., the number of samples in some categories is much larger than that in other categories). This leads to the trained model possibly relying on biases in the data rather than clinically meaningful factors. 2. **Model Reliability Assessment**: - Even if the model shows perfect classification accuracy on the test set, whether its decision - making basis is consistent with that of medical experts is still a problem that needs to be verified. The model may utilize irrelevant features or biases in the image rather than making predictions based on actual pathological features. 3. **Lack of Interpretability**: - Deep - learning models are usually regarded as "black boxes", and it is difficult to understand their decision - making processes. In order to enhance doctors' trust in the model, it is necessary to ensure the interpretability of the model so that it can highlight the same image areas as medical experts. ### Main contributions of the paper: 1. **Pre - training Strategy**: - Use large - scale proxy tasks (such as the NIH - CXR14 dataset) to pre - train the DNN to reduce interpretation biases and improve the generalization ability of the model. 2. **Mixed - Objective Optimization Network (MOON)**: - Introduce the MOON technique during pre - training and fine - tuning, and mitigate the impact of data imbalance by balancing the weights of different categories, thereby further aligning the model's decisions with the judgments of human experts. 3. **Experimental Verification**: - Through experiments on the target dataset (TBX11K) and the external dataset (Shenzhen), it is proved that these methods not only maintain high classification accuracy but also significantly improve the interpretability and generalization ability of the model. ### Summary: The paper aims to solve the reliability and interpretability problems of deep neural networks in tuberculosis detection by improving the pre - training strategy and introducing class - balancing techniques, especially in the case of data imbalance and scarce labeled data.