Abstract:Multimodal magnetic resonance imaging (MRI) provides complementary information for sub-region analysis of brain tumors. Plenty of methods have been proposed for automatic brain tumor segmentation using four common MRI modalities and achieved remarkable performance. In practice, however, it is common to have one or more modalities missing due to image corruption, artifacts, acquisition protocols, allergy to contrast agents, or simply cost. In this work, we propose a novel two-stage framework for brain tumor segmentation with missing modalities. In the first stage, a multimodal masked autoencoder (M3AE) is proposed, where both random modalities (i.e., modality dropout) and random patches of the remaining modalities are masked for a reconstruction task, for self-supervised learning of robust multimodal representations against missing modalities. To this end, we name our framework M3AE. Meanwhile, we employ model inversion to optimize a representative full-modal image at marginal extra cost, which will be used to substitute for the missing modalities and boost performance during inference. Then in the second stage, a memory-efficient self distillation is proposed to distill knowledge between heterogenous missing-modal situations while fine-tuning the model for supervised segmentation. Our M3AE belongs to the 'catch-all' genre where a single model can be applied to all possible subsets of modalities, thus is economic for both training and deployment. Extensive experiments on BraTS 2018 and 2020 datasets demonstrate its superior performance to existing state-of-the-art methods with missing modalities, as well as the efficacy of its components. Our code is available at: https://github.com/ccarliu/m3ae.

M3LA: A Novel Approach Based on Encoder-Decoder with Attention Framework for Multi-modal Multi-label Learning

Dual Enhancement for Multi-Label Learning with Missing Labels

Rethinking Modal-oriented Label Correlations for Multi-modal Multi-label Learning

Collaboration based multi-modal multi-label learning

PML-ED: A method of partial multi-label learning by using encoder-decoder framework and exploring label correlation

Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA.

Detached and Interactive Multimodal Learning

Weakly-Supervised Multi-view Multi-instance Multi-label Learning

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

Label distribution for multimodal machine learning

Semi-Supervised Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Transport

Multimodal Representation Learning by Alternating Unimodal Adaptation

Many Could Be Better Than All: A Novel Instance-Oriented Algorithm for Multi-modal Multi-label Problem.

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Multi-instance multi-label new label learning

M$^{3}$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning

CaMML: Context-Aware Multimodal Learner for Large Models

Relation-Aware Alignment Attention Network for Multi-view Multi-label Learning.

Deep Mamba Multi-modal Learning

Complex Object Classification

MsCoa: Multi-Step Co-Attention Model for Multi-Label Classification