Complex Mixer for MedMNIST Classification Decathlon

Zhuoran Zheng,Xiuyi Jia
2023-04-20
Abstract:With the development of the medical image field, researchers seek to develop a class of datasets to block the need for medical knowledge, such as \text{MedMNIST} (v2). MedMNIST (v2) includes a large number of small-sized (28 $\times$ 28 or 28 $\times$ 28 $\times$ 28) medical samples and the corresponding expert annotations (class label). The existing baseline model (Google AutoML Vision, ResNet-50+3D) can reach an average accuracy of over 70\% on MedMNIST (v2) datasets, which is comparable to the performance of expert decision-making. Nevertheless, we note that there are two insurmountable obstacles to modeling on MedMNIST (v2): 1) the raw images are cropped to low scales may cause effective recognition information to be dropped and the classifier to have difficulty in tracing accurate decision boundaries; 2) the labelers' subjective insight may cause many uncertainties in the label space. To address these issues, we develop a Complex Mixer (C-Mixer) with a pre-training framework to alleviate the problem of insufficient information and uncertainty in the label space by introducing an incentive imaginary matrix and a self-supervised scheme with random masking. Our method (incentive learning and self-supervised learning with masking) shows surprising potential on both the standard MedMNIST (v2) dataset, the customized weakly supervised datasets, and other image enhancement tasks.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two main problems in the MedMNIST (v2) dataset: 1. **Information loss caused by image cropping**: - The images in the MedMNIST (v2) dataset are cropped to low resolution (28 × 28 or 28 × 28 × 28), which may lead to the loss of effective identification information and make it difficult for the classifier to find accurate decision boundaries. This cropping method (linear interpolation) will lose a large number of local details, especially the lesion information on medical images. 2. **Uncertainty in the label space**: - Due to the subjective opinions of expert annotations, mislabeling inevitably exists in the 18 datasets in the MedMNIST (v2) dataset, which leads to a large amount of uncertainty in the label space. The existing baseline algorithms do not model the uncertainty in the label space, which is a key reason for the limited performance of the classifier. To address these problems, the authors propose a new method named **Complex Mixer (C - Mixer)**, which combines the incentive learning and self - supervised learning frameworks. Specifically: - **Incentive learning**: By introducing conditional noise to supplement input information, reduce learning costs, and provide some signals helpful for identification. - **Self - supervised learning**: Through the method of random masking, represent similar samples consistently during the training process, thereby overcoming the uncertainty in the label space. ### Method overview 1. **Incentive learning**: - Introduce conditional noise \(x^*\) to supplement input information \(x\) and generate incentive signals \(x^*(i)\). This process can be represented by the following formula: \[ x^*(i)\sim D_G(\text{Tanh}(\text{MLP}(\text{Flatten}(I)))) \] - Here, \(D_G\) is a Gaussian distribution, \(\text{Tanh}\) is a normalization function, \(\text{MLP}\) is a multi - layer perceptron, which is used to generate the parameters \(\mu\) and \(\sigma\) of the Gaussian distribution. 2. **C - Mixer**: - C - Mixer is an improved model based on MLP - Mixer and can extract features in the complex domain. The input tensor \(h = a+bi\) is processed by affine transformation and activation function: \[ Wh=(Aa - Bb)+i(Ba + Ab) \] - The activation function uses CReLU: \[ \text{CReLU}=\text{ReLU}(a)+i\text{ReLU}(b) \] 3. **Self - supervised learning framework**: - Through the method of random masking, pre - process the anchor view and the target view, and then input them into C - Mixer. The loss function uses cross - entropy: \[ \min H(p, q)=H(P(\text{C - Mixer}(\text{Mask}(\text{anchor view}))), P(\text{C - Mixer}(\text{Mask}(\text{target view})))) \] ### Experimental results - **Fully supervised tasks**: On the MedMNIST (v2) dataset, the performance of C - Mixer is better than existing baseline methods, such as ResNet - 18/50, AutoKeras, Google AutoML Vision, etc. - **Semi - supervised tasks**: When only using 10% of the labeled training data, C - Mixer still...