Abstract:This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the weak modality, leading to modality competition, where the dominant modality overpowers the learning process. To address this issue, we study the modality-alternating learning paradigm to achieve reconcilement. Specifically, we propose a new method called ReconBoost to update a fixed modality each time. Herein, the learning objective is dynamically adjusted with a reconcilement regularization against competition with the historical models. By choosing a KL-based reconcilement, we show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others and help enhance the overall performance. The major difference with the classic GB is that we only preserve the newest model for each modality to avoid overfitting caused by ensembling strong learners. Furthermore, we propose a memory consolidation scheme and a global rectification scheme to make this strategy more effective. Experiments over six multi-modal benchmarks speak to the efficacy of the method. We release the code at

What problem does this paper attempt to address?

This paper attempts to solve the problem of modal competition in multimodal learning. Specifically, the current multimodal learning paradigm usually adopts a joint learning strategy, that is, exploring the features of multiple modalities simultaneously. This strategy leads to the neglect of the gradients of weak modalities during the gradient update process, so that the dominant modality takes advantage in the learning process, and then affects the learning effects of other modalities. Therefore, the paper proposes a new method - ReconBoost. By alternately updating the parameters of each modality and introducing a harmonic regularization term to maximize the diversity between the current model and the historical model, the problem of modal competition is alleviated and the overall performance is improved. ### Main contributions 1. **Proposing the ReconBoost method**: By alternately updating the parameters of each modality, the problem of modal competition is avoided, ensuring that each modality can fully optimize its features. 2. **Harmonic regularization**: A harmonic regularization term is introduced to dynamically adjust the learning objective and further alleviate the problem of modal competition. 3. **Theoretical analysis**: It is proved that when the KL divergence is selected as the harmonic term, the ReconBoost method can realize an alternating version of the gradient boosting algorithm. 4. **Experimental verification**: Experiments were carried out on six multimodal benchmark datasets to verify the effectiveness of the ReconBoost method, which is significantly better than the existing multimodal learning methods. ### Problems solved - **Modal competition**: In the current multimodal learning paradigm, the dominant modality will suppress the learning of other modalities, resulting in a decline in performance. - **Insufficient optimization of weak modalities**: The gradients of weak modalities are ignored and their potential cannot be fully exploited. - **Limitations of joint learning**: The joint learning strategy has the problem of performance compromise when dealing with multimodal data. ### Method overview 1. **Alternate update**: Only update the parameters of one modality at a time and keep the parameters of other modalities unchanged. 2. **Harmonic regularization**: By introducing the KL divergence as a harmonic term, the diversity between the current update and the historical model is maximized. 3. **Memory consolidation**: Through the memory consolidation regularization term, ensure that the learner of each modality will not pay too much attention to errors and maintain the memory of modality - specific patterns. 4. **Global correction**: After each update, allow the learner parameters of all modalities to be globally corrected to avoid falling into local optima. ### Experimental results - **Performance improvement**: The performance of ReconBoost on multiple multimodal benchmark datasets is significantly better than that of existing methods. - **Modal - specific encoder evaluation**: The encoders trained by ReconBoost have achieved significant performance improvements in all modalities, especially in the visual modality. - **Modal competition analysis**: Through t - SNE visualization, ReconBoost can effectively reduce modal competition and improve the quality of features. In conclusion, by proposing the ReconBoost method, this paper successfully solves the problem of modal competition in multimodal learning and provides a new and effective solution for multimodal learning.

ReconBoost: Boosting Can Achieve Modality Reconcilement

Improving Multimodal Learning with Multi-Loss Gradient Modulation

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Diagnosing and Re-learning for Balanced Multimodal Learning

Multimodal Representation Learning by Alternating Unimodal Adaptation

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

On-the-fly Modulation for Balanced Multimodal Learning

PMR: Prototypical Modal Rebalance for Multimodal Learning

Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

The Balanced Multi-Modal Spiking Neural Networks with Online Loss Adjustment and Time Alignment

Multimodal Classification via Modal-Aware Interactive Enhancement

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Adapt and explore: Multimodal mixup for representation learning

Multimodal Fusion Balancing Through Game-Theoretic Regularization

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

On the Comparison between Multi-modal and Single-modal Contrastive Learning

Cross-Modal Knowledge Transfer via Inter-Modal Translation and Alignment for Affect Recognition

Rethinking Missing Modality Learning: From a Decoding View

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Effective Multimodal Reinforcement Learning with Modality Alignment and Importance Enhancement