Abstract:To overcome the imbalanced multimodal learning problem, where models prefer the training of specific modalities, existing methods propose to control the training of uni-modal encoders from different perspectives, taking the inter-modal performance discrepancy as the basis. However, the intrinsic limitation of modality capacity is ignored. The scarcely informative modalities can be recognized as ``worse-learnt'' ones, which could force the model to memorize more noise, counterproductively affecting the multimodal model ability. Moreover, the current modality modulation methods narrowly concentrate on selected worse-learnt modalities, even suppressing the training of others. Hence, it is essential to consider the intrinsic limitation of modality capacity and take all modalities into account during balancing. To this end, we propose the Diagnosing \& Re-learning method. The learning state of each modality is firstly estimated based on the separability of its uni-modal representation space, and then used to softly re-initialize the corresponding uni-modal encoder. In this way, the over-emphasizing of scarcely informative modalities is avoided. In addition, encoders of worse-learnt modalities are enhanced, simultaneously avoiding the over-training of other modalities. Accordingly, multimodal learning is effectively balanced and enhanced. Experiments covering multiple types of modalities and multimodal frameworks demonstrate the superior performance of our simple-yet-effective method for balanced multimodal learning. The source code and dataset are available at \url{<a class="link-external link-https" href="https://github.com/GeWu-Lab/Diagnosing_Relearning_ECCV2024" rel="external noopener nofollow">this https URL</a>}.

Rethinking Missing Modality Learning: From a Decoding View

Rethinking Missing Modality Learning from a Decoding Perspective

Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization

Learning Unseen Modality Interaction

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

Toward Robust Multimodal Learning using Multimodal Foundational Models

What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?

Diagnosing and Re-learning for Balanced Multimodal Learning

One-stage Modality Distillation for Incomplete Multimodal Learning

Exploring Missing Modality in Multimodal Egocentric Datasets

SMIL: Multimodal Learning with Severely Missing Modality

MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences

Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity

Maximum Likelihood Estimation for Multimodal Learning with Missing Modality

MSH-Net: Modality-Shared Hallucination With Joint Adaptation Distillation for Remote Sensing Image Classification Using Missing Modalities

Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

Deep Multimodal Learning with Missing Modality: A Survey

Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach