Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Hao Wang,Shengda Luo,Guosheng Hu,Jianguo Zhang
2024-02-26
Abstract:Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model training, greatly degrading the missing modality performance. Motivated by Grad-CAM, we introduce a novel indicator, gradients, to monitor and reduce modality dominance which widely exists in the missing-modality scenario. In aid of this indicator, we present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities. Specifically, GMD removes the conflicted gradient components from different modalities to achieve this decoupling, significantly improving the performance. In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing (DS) framework which can adaptively switch on/off the network parameters based on whether one modality is available. We conduct extensive experiments on three popular multimodal benchmarks, including BraTS 2018 for medical segmentation, CMU-MOSI, and CMU-MOSEI for sentiment analysis. The results show that our method can significantly outperform the competitors, showing the effectiveness of the proposed solutions. Our code is released here: https://github.com/HaoWang420/Gradient-guided-Modality-Decoupling.
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the robustness and performance of the model in multimodal learning when the input data is incomplete (i.e., some modalities are missing). Specifically, the paper focuses on the modality dominance problem. That is, during the training process, the model overly depends on some key modalities and ignores the information of other modalities, which leads to a significant decline in the model's performance when some modalities are missing. ### Main contributions of the paper: 1. **In - depth analysis of the modality dominance problem**: - The authors found through experiments that the modality dominance phenomenon is widespread in multimodal learning and has a significant negative impact on the model's performance. - They proposed using gradients as an indicator to measure the importance of modalities and identify and solve the modality dominance problem by analyzing gradients. 2. **Propose the Gradient - guided Modality Decoupling method (GMD)**: - The GMD method identifies and eliminates conflicting gradient components by analyzing the gradients of different modalities, thereby balancing the contributions of each modality and reducing the modality dominance phenomenon. - Specifically, when the gradient directions of two modalities are opposite or significantly different, GMD will remove these conflicting gradient components, making the model use the information of each modality more evenly during the training process. 3. **Design the Dynamic Sharing (DS) framework**: - The DS framework flexibly processes incomplete - modality data by adaptively switching network parameters, avoiding the introduction of misleading padding values (such as zero - padding). - This method not only improves the model's robustness but also reduces the consumption of computing resources. 4. **Extensive experimental verification**: - The authors conducted experiments on multiple multimodal benchmark datasets, including the medical image segmentation task (BraTS 2018) and the sentiment analysis tasks (CMU - MOSI and CMU - MOSEI). - The experimental results show that the proposed methods significantly outperform existing methods in various cases of incomplete modalities, especially performing particularly well on the most challenging tasks. ### Conclusion: By introducing the Gradient - guided Modality Decoupling (GMD) method and the Dynamic Sharing (DS) framework, this paper effectively solves the modality dominance problem in multimodal learning and improves the robustness and performance of the model in cases of incomplete modalities. These methods are not only innovative in theory but also show powerful effects in practical applications.