Ensemble of ConvNeXt V2 and MaxViT for Long-Tailed CXR Classification with View-Based Aggregation

Yosuke Yamagishi,Shouhei Hanaoka
2024-10-15
Abstract:In this work, we present our solution for the MICCAI 2024 CXR-LT challenge, achieving 4th place in Subtask 2 and 5th in Subtask 1. We leveraged an ensemble of ConvNeXt V2 and MaxViT models, pretrained on an external chest X-ray dataset, to address the long-tailed distribution of chest findings. The proposed method combines state-of-the-art image classification techniques, asymmetric loss for handling class imbalance, and view-based prediction aggregation to enhance classification performance. Through experiments, we demonstrate the advantages of our approach in improving both detection accuracy and the handling of the long-tailed distribution in CXR findings. The code is available at <a class="link-external link-https" href="https://github.com/yamagishi0824/cxrlt24-multiview-pp" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are the long - tail distribution problem in chest X - ray (CXR) classification and the problem of multi - view information integration. Specifically: 1. **Long - tail distribution problem**: In chest X - ray data, the frequency of certain diseases or pathological conditions is much lower than that of other common diseases, forming the so - called "long - tail distribution". This unbalanced data distribution poses challenges to model training, especially for the accurate detection of rare diseases. 2. **Multi - view information integration**: Chest X - ray examinations usually include multiple views (such as anteroposterior and lateral views), and each view provides different and unique information. How to effectively integrate the information of these multi - views to improve the accuracy of diagnosis is an important research direction. To solve these problems, the author proposes an integration method based on ConvNeXt V2 and MaxViT models, combined with the following technical means: - **Asymmetric loss function**: It is used to deal with the class imbalance problem by assigning higher weights to rare classes to reduce the bias towards common classes. - **View - based prediction aggregation**: By performing a weighted average of the prediction results of anteroposterior and lateral images, a more reliable overall prediction is obtained. Through these methods, the author aims to improve the detection accuracy in the chest X - ray classification task, especially when dealing with long - tail distributed data. ### Formula summary The formulas mentioned in the paper are mainly used to describe the view - based prediction aggregation process: 1. **Calculate the average prediction values of each view**: \[ P_f=\frac{1}{N_f}\sum_{i = 1}^{N_f}P_{f,i},\quad P_l=\frac{1}{N_l}\sum_{i = 1}^{N_l}P_{l,i} \] where \(P_f\) and \(P_l\) are the average prediction values of the anteroposterior and lateral views respectively, \(N_f\) and \(N_l\) are the numbers of anteroposterior and lateral images respectively, and \(P_{f,i}\) and \(P_{l,i}\) are the prediction values of a single image. 2. **Weighted average to combine the prediction values of each view**: \[ P_{\text{final}}=\frac{w_fP_f + w_lP_l}{w_f + w_l} \] where \(P_{\text{final}}\) is the final prediction value, and \(w_f\) and \(w_l\) are the weights of the anteroposterior and lateral views respectively. These methods work together to enable the model to perform better when dealing with complex and unbalanced chest X - ray data.