Abstract:Multimodal magnetic resonance imaging (MRI) provides complementary information for sub-region analysis of brain tumors. Plenty of methods have been proposed for automatic brain tumor segmentation using four common MRI modalities and achieved remarkable performance. In practice, however, it is common to have one or more modalities missing due to image corruption, artifacts, acquisition protocols, allergy to contrast agents, or simply cost. In this work, we propose a novel two-stage framework for brain tumor segmentation with missing modalities. In the first stage, a multimodal masked autoencoder (M3AE) is proposed, where both random modalities (i.e., modality dropout) and random patches of the remaining modalities are masked for a reconstruction task, for self-supervised learning of robust multimodal representations against missing modalities. To this end, we name our framework M3AE. Meanwhile, we employ model inversion to optimize a representative full-modal image at marginal extra cost, which will be used to substitute for the missing modalities and boost performance during inference. Then in the second stage, a memory-efficient self distillation is proposed to distill knowledge between heterogenous missing-modal situations while fine-tuning the model for supervised segmentation. Our M3AE belongs to the 'catch-all' genre where a single model can be applied to all possible subsets of modalities, thus is economic for both training and deployment. Extensive experiments on BraTS 2018 and 2020 datasets demonstrate its superior performance to existing state-of-the-art methods with missing modalities, as well as the efficacy of its components. Our code is available at: https://github.com/ccarliu/m3ae.

MMAN-M2: Multiple Multi-head Attentions Network based on Encoder with Missing Modalities

Fine-Grained Cross-Modal Retrieval with Triple-Streamed Memory Fusion Transformer Encoder

What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?

MSAF: Multimodal Split Attention Fusion

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions

Multi-modal Attention for Speech Emotion Recognition

MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning

TeFNA: Text-centered Fusion Network with crossmodal Attention for multimodal sentiment analysis

MMTM: Multimodal Transfer Module for CNN Fusion

Multi-level Attention Map Network for Multimodal Sentiment Analysis

Multimodal Multi-loss Fusion Network for Sentiment Analysis

Multimodal Semantic Attention Network for Video Captioning

Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism

Multi-Feature Fusion Multi-Modal Sentiment Analysis Model Based on Cross-Attention Mechanism

Multimodal Fusion Method Based on Self-Attention Mechanism

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks