Conv-attention ViT for classification of multi-label class imbalanced data of lung thoracic diseases
Oommen, Lintu,Nikhila Nagajyothi, Chiluka
DOI: https://doi.org/10.1007/s11042-024-20363-z
IF: 2.577
2024-11-09
Multimedia Tools and Applications
Abstract:Chest X-ray image classification is a key study topic, and in order to increase performance and accuracy, the efficiency of vision transformers for this task has been examined. However, imbalanced datasets pose a significant challenge during model training, leading to biased results. The main focus is the classification of multi-labeled lung thoracic diseases using imbalanced data. To tackle class imbalance, synthetic data was generated using GANs and combined with images from diverse data sources, which enhanced generalization. To classify multi-labeled chest X-rays, this work proposes a Conv-Attention Vision Transformer (CA-ViT) model that uses local and global attention mechanisms to improve the average ROC-AUC score and F1-score. Proposed model, CA-ViT's performance was evaluated on an imbalanced dataset and compared with ResNet-50, VGG-19, and ViT 32/384 models. The studies demonstrated that CA-ViT surpassed the performance of the other models, scoring an average ROC-AUC of 0.76 as opposed to 0.72, 0.75, and 0.68 for ResNet-50, VGG-19, and ViT 32/384 models, respectively. All models displayed improved assessment metrics when the dataset was balanced via synthetic data creation and the combination of several data distributions. When compared to the other models, CA-ViT performed best, earning an average ROC-AUC of 0.81. In contrast, the average ROC-AUC values for the ResNet-50, VGG-19, and ViT 32/384 models were 0.7, 0.73, and 0.79 respectively. CA-ViT achieved micro-average F1-score of 0.70.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering