Abstract:With the development of the medical image field, researchers seek to develop a class of datasets to block the need for medical knowledge, such as \text{MedMNIST} (v2). MedMNIST (v2) includes a large number of small-sized (28 $\times$ 28 or 28 $\times$ 28 $\times$ 28) medical samples and the corresponding expert annotations (class label). The existing baseline model (Google AutoML Vision, ResNet-50+3D) can reach an average accuracy of over 70\% on MedMNIST (v2) datasets, which is comparable to the performance of expert decision-making. Nevertheless, we note that there are two insurmountable obstacles to modeling on MedMNIST (v2): 1) the raw images are cropped to low scales may cause effective recognition information to be dropped and the classifier to have difficulty in tracing accurate decision boundaries; 2) the labelers' subjective insight may cause many uncertainties in the label space. To address these issues, we develop a Complex Mixer (C-Mixer) with a pre-training framework to alleviate the problem of insufficient information and uncertainty in the label space by introducing an incentive imaginary matrix and a self-supervised scheme with random masking. Our method (incentive learning and self-supervised learning with masking) shows surprising potential on both the standard MedMNIST (v2) dataset, the customized weakly supervised datasets, and other image enhancement tasks.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve two main problems in the MedMNIST (v2) dataset: 1. **Information loss caused by image cropping**: - The images in the MedMNIST (v2) dataset are cropped to low resolution (28 × 28 or 28 × 28 × 28), which may lead to the loss of effective identification information and make it difficult for the classifier to find accurate decision boundaries. This cropping method (linear interpolation) will lose a large number of local details, especially the lesion information on medical images. 2. **Uncertainty in the label space**: - Due to the subjective opinions of expert annotations, mislabeling inevitably exists in the 18 datasets in the MedMNIST (v2) dataset, which leads to a large amount of uncertainty in the label space. The existing baseline algorithms do not model the uncertainty in the label space, which is a key reason for the limited performance of the classifier. To address these problems, the authors propose a new method named **Complex Mixer (C - Mixer)**, which combines the incentive learning and self - supervised learning frameworks. Specifically: - **Incentive learning**: By introducing conditional noise to supplement input information, reduce learning costs, and provide some signals helpful for identification. - **Self - supervised learning**: Through the method of random masking, represent similar samples consistently during the training process, thereby overcoming the uncertainty in the label space. ### Method overview 1. **Incentive learning**: - Introduce conditional noise $x^*$ to supplement input information $x$ and generate incentive signals $x^*(i)$. This process can be represented by the following formula: \[ x^*(i)\sim D_G(\text{Tanh}(\text{MLP}(\text{Flatten}(I)))) \] - Here, $D_G$ is a Gaussian distribution, $\text{Tanh}$ is a normalization function, $\text{MLP}$ is a multi - layer perceptron, which is used to generate the parameters $\mu$ and $\sigma$ of the Gaussian distribution. 2. **C - Mixer**: - C - Mixer is an improved model based on MLP - Mixer and can extract features in the complex domain. The input tensor $h = a+bi$ is processed by affine transformation and activation function: \[ Wh=(Aa - Bb)+i(Ba + Ab) \] - The activation function uses CReLU: \[ \text{CReLU}=\text{ReLU}(a)+i\text{ReLU}(b) \] 3. **Self - supervised learning framework**: - Through the method of random masking, pre - process the anchor view and the target view, and then input them into C - Mixer. The loss function uses cross - entropy: \[ \min H(p, q)=H(P(\text{C - Mixer}(\text{Mask}(\text{anchor view}))), P(\text{C - Mixer}(\text{Mask}(\text{target view})))) \] ### Experimental results - **Fully supervised tasks**: On the MedMNIST (v2) dataset, the performance of C - Mixer is better than existing baseline methods, such as ResNet - 18/50, AutoKeras, Google AutoML Vision, etc. - **Semi - supervised tasks**: When only using 10% of the labeled training data, C - Mixer still...

Complex Mixer for MedMNIST Classification Decathlon

MixFormer: a Mixed CNN-Transformer Backbone for Medical Image Segmentation

SMMix: Self-Motivated Image Mixing for Vision Transformers

Multi-Scale MLP-Mixer for image classification

AutoMO-Mixer: An automated multi-objective Mixer model for balanced, safe and robust prediction in medicine

Wide-field Imaging and Recognition Through Cascaded Complex Scattering Media

Adaptive Mix for Semi-Supervised Medical Image Segmentation

D2-MLP: Dynamic Decomposed MLP Mixer for Medical Image Segmentation

AutoMix: Unveiling the Power of Mixup for Stronger Classifiers

MixCL: Pixel label matters to contrastive learning

MiAMix: Enhancing Image Classification through a Multi-stage Augmented Mixed Sample Data Augmentation Method

ModelMix: A New Model-Mixup Strategy to Minimize Vicinal Risk across Tasks for Few-scribble based Cardiac Segmentation

PCLMix: Weakly Supervised Medical Image Segmentation via Pixel-Level Contrastive Learning and Dynamic Mix Augmentation

MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing

MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding

OpenMixup: A Comprehensive Mixup Benchmark for Visual Classification

ConfidentMix: Confidence-Guided Mixup for Learning With Noisy Labels

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

QMix: Quality-aware Learning with Mixed Noise for Robust Retinal Disease Diagnosis

TCAMixer: A lightweight Mixer based on a novel triple concepts attention mechanism for NLP