Abstract:Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model training, greatly degrading the missing modality performance. Motivated by Grad-CAM, we introduce a novel indicator, gradients, to monitor and reduce modality dominance which widely exists in the missing-modality scenario. In aid of this indicator, we present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities. Specifically, GMD removes the conflicted gradient components from different modalities to achieve this decoupling, significantly improving the performance. In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing (DS) framework which can adaptively switch on/off the network parameters based on whether one modality is available. We conduct extensive experiments on three popular multimodal benchmarks, including BraTS 2018 for medical segmentation, CMU-MOSI, and CMU-MOSEI for sentiment analysis. The results show that our method can significantly outperform the competitors, showing the effectiveness of the proposed solutions. Our code is released here: https://github.com/HaoWang420/Gradient-guided-Modality-Decoupling.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the robustness and performance of the model in multimodal learning when the input data is incomplete (i.e., some modalities are missing). Specifically, the paper focuses on the modality dominance problem. That is, during the training process, the model overly depends on some key modalities and ignores the information of other modalities, which leads to a significant decline in the model's performance when some modalities are missing. ### Main contributions of the paper: 1. **In - depth analysis of the modality dominance problem**: - The authors found through experiments that the modality dominance phenomenon is widespread in multimodal learning and has a significant negative impact on the model's performance. - They proposed using gradients as an indicator to measure the importance of modalities and identify and solve the modality dominance problem by analyzing gradients. 2. **Propose the Gradient - guided Modality Decoupling method (GMD)**: - The GMD method identifies and eliminates conflicting gradient components by analyzing the gradients of different modalities, thereby balancing the contributions of each modality and reducing the modality dominance phenomenon. - Specifically, when the gradient directions of two modalities are opposite or significantly different, GMD will remove these conflicting gradient components, making the model use the information of each modality more evenly during the training process. 3. **Design the Dynamic Sharing (DS) framework**: - The DS framework flexibly processes incomplete - modality data by adaptively switching network parameters, avoiding the introduction of misleading padding values (such as zero - padding). - This method not only improves the model's robustness but also reduces the consumption of computing resources. 4. **Extensive experimental verification**: - The authors conducted experiments on multiple multimodal benchmark datasets, including the medical image segmentation task (BraTS 2018) and the sentiment analysis tasks (CMU - MOSI and CMU - MOSEI). - The experimental results show that the proposed methods significantly outperform existing methods in various cases of incomplete modalities, especially performing particularly well on the most challenging tasks. ### Conclusion: By introducing the Gradient - guided Modality Decoupling (GMD) method and the Dynamic Sharing (DS) framework, this paper effectively solves the modality dominance problem in multimodal learning and improves the robustness and performance of the model in cases of incomplete modalities. These methods are not only innovative in theory but also show powerful effects in practical applications.

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

On-the-fly Modulation for Balanced Multimodal Learning

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

Rethinking Missing Modality Learning: From a Decoding View

Rethinking Missing Modality Learning from a Decoding Perspective

Robust Multimodal Learning via Representation Decoupling

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Deep Multimodal Learning with Missing Modality: A Survey

What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?

Diagnosing and Re-learning for Balanced Multimodal Learning

Improving Multimodal Learning with Multi-Loss Gradient Modulation

M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities

Detached and Interactive Multimodal Learning

Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing

Progressive Learning of a Multimodal Classifier Accounting for Different Modality Combinations

MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning

Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

SMIL: Multimodal Learning with Severely Missing Modality