Abstract:The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate models to handle specific missing modality settings. In addition, these models are designed for specific tasks, so for example, classification models are not easily adapted to segmentation tasks and vice versa. In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. ShaSpec is designed to take advantage of all available input modalities during training and evaluation by learning shared and specific features to better represent the input data. This is achieved from a strategy that relies on auxiliary tasks based on distribution alignment and domain classification, in addition to a residual feature fusion procedure. Also, the design simplicity of ShaSpec enables its easy adaptation to multiple tasks, such as classification and segmentation. Experiments are conducted on both medical image segmentation and computer vision classification, with results indicating that ShaSpec outperforms competing methods by a large margin. For instance, on BraTS2018, ShaSpec improves the SOTA by more than 3% for enhancing tumour, 5% for tumour core and 3% for whole tumour. The code repository address is <a class="link-external link-https" href="https://github.com/billhhh/ShaSpec/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the problem of missing modalities in multi - modal learning. Specifically, current methods for dealing with missing modalities either only handle missing modalities at the evaluation stage or set up separate models for specific missing modalities for training. These methods are usually designed for specific tasks. For example, classification models are difficult to adapt to segmentation tasks and vice versa. Therefore, the paper proposes a new method - Shared - Specific Feature Modelling (ShaSpec). This method can utilize all available input modalities during both training and evaluation, and better represent input data by learning shared and specific features. In addition, the design of ShaSpec is simple, enabling it to easily adapt to multiple tasks such as classification and segmentation. ### Main contributions 1. **Proposed an extremely simple and effective multi - modal learning method**: Based on modelling and fusing shared and specific features, ShaSpec can handle missing modalities in training and evaluation, and support dedicated and non - dedicated training. 2. **Achieved multi - task adaptation for the first time**: As far as the authors know, ShaSpec is the first missing - modality multi - modal method that can easily adapt to classification and segmentation tasks. ### Experimental results The paper conducted experiments on computer vision classification and medical image segmentation benchmarks, and the results show that ShaSpec has achieved state - of - the - art performance. In particular, on the BraTS2018 dataset, compared with recently proposed competing methods, ShaSpec has increased the segmentation accuracy of enhancing tumors, tumor cores, and the whole tumor by more than 3%, 5% and 3% respectively. ### Method overview #### 3.1 Overall architecture The ShaSpec model consists of a shared encoder \( f_{\theta_{\text{sha}}} \), specific encoders \( f_{\theta_{\text{spec}}}^{(i)} \), a feature projection layer \( f_{\theta_{\text{proj}}} \) and a decoder \( f_{\theta_{\text{dec}}} \). For data \( M_j=\{x^{(i)}_j\}_{i = 1}^N \) of \( N \) modalities, each modality \( x^{(i)}_j\in X \) extracts shared features \( r^{(i)} \) and specific features \( s^{(i)} \) through the shared encoder and specific encoders. Then, the shared and specific features are fused through a residual fusion process, and finally the decoder generates the prediction result. #### 3.2 Evaluation of complete and missing modalities - **Complete modalities**: \[ r^{(i)}=f_{\theta_{\text{sha}}}(x^{(i)}),\quad s^{(i)}=f_{\theta_{\text{spec}}}^{(i)}(x^{(i)}) \] \[ f^{(i)}=f_{\theta_{\text{proj}}}(r^{(i)}, s^{(i)})+r^{(i)} \] \[ \tilde{y}=f_{\theta_{\text{dec}}}(f^{(1)},\ldots,f^{(N)}) \] - **Missing modalities**: Suppose the \( n \) - th modality is missing. The feature extraction process for other modalities is the same, but the feature \( f^{(n)} \) of the missing modality is generated in the following way: \[ f^{(n)}=\frac{1}{N - 1}\sum_{i = 1, i\neq n}^N r^{(i)} \] #### 3.3 Training process In addition to optimizing the main task (classification or segmentation), two auxiliary tasks are introduced: domain classification and distribution alignment, to learn specific and shared feature representations. - **Domain classification objective**: \[ \ell_{\text{dco}}(D,\theta_{\text{s}})

Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling

Adapting the Segment Anything Model for Multi-modal Retinal Anomaly Detection and Localization

Mutual Information-Based Graph Co-Attention Networks for Multimodal Prior-Guided Magnetic Resonance Imaging Segmentation

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration

PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates

Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions

Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

Multi-modal Learning with Missing Modality in Predicting Axillary Lymph Node Metastasis

MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training

Modality-aware Mutual Learning for Multi-modal Medical Image Segmentation

MultiSenseSeg: A Cost-Effective Unified Multimodal Semantic Segmentation Model for Remote Sensing

Enhancing Modality-Agnostic Representations via Meta-Learning for Brain Tumor Segmentation

Robust Divergence Learning for Missing-Modality Segmentation

Multi-modal Brain Tumor Segmentation via Missing Modality Synthesis and Modality-level Attention Fusion

MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment

Toward Robust Multimodal Learning using Multimodal Foundational Models

Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion

Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

Learning Unified Hyper-Network for Multi-Modal MR Image Synthesis and Tumor Segmentation With Missing Modalities