ReconBoost: Boosting Can Achieve Modality Reconcilement

Cong Hua,Qianqian Xu,Shilong Bao,Zhiyong Yang,Qingming Huang
2024-05-15
Abstract:This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions. This is motivated by the fact that current paradigms of multi-modal learning tend to explore multi-modal features simultaneously. The resulting gradient prohibits further exploitation of the features in the weak modality, leading to modality competition, where the dominant modality overpowers the learning process. To address this issue, we study the modality-alternating learning paradigm to achieve reconcilement. Specifically, we propose a new method called ReconBoost to update a fixed modality each time. Herein, the learning objective is dynamically adjusted with a reconcilement regularization against competition with the historical models. By choosing a KL-based reconcilement, we show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others and help enhance the overall performance. The major difference with the classic GB is that we only preserve the newest model for each modality to avoid overfitting caused by ensembling strong learners. Furthermore, we propose a memory consolidation scheme and a global rectification scheme to make this strategy more effective. Experiments over six multi-modal benchmarks speak to the efficacy of the method. We release the code at
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning,Multimedia
What problem does this paper attempt to address?
This paper attempts to solve the problem of modal competition in multimodal learning. Specifically, the current multimodal learning paradigm usually adopts a joint learning strategy, that is, exploring the features of multiple modalities simultaneously. This strategy leads to the neglect of the gradients of weak modalities during the gradient update process, so that the dominant modality takes advantage in the learning process, and then affects the learning effects of other modalities. Therefore, the paper proposes a new method - ReconBoost. By alternately updating the parameters of each modality and introducing a harmonic regularization term to maximize the diversity between the current model and the historical model, the problem of modal competition is alleviated and the overall performance is improved. ### Main contributions 1. **Proposing the ReconBoost method**: By alternately updating the parameters of each modality, the problem of modal competition is avoided, ensuring that each modality can fully optimize its features. 2. **Harmonic regularization**: A harmonic regularization term is introduced to dynamically adjust the learning objective and further alleviate the problem of modal competition. 3. **Theoretical analysis**: It is proved that when the KL divergence is selected as the harmonic term, the ReconBoost method can realize an alternating version of the gradient boosting algorithm. 4. **Experimental verification**: Experiments were carried out on six multimodal benchmark datasets to verify the effectiveness of the ReconBoost method, which is significantly better than the existing multimodal learning methods. ### Problems solved - **Modal competition**: In the current multimodal learning paradigm, the dominant modality will suppress the learning of other modalities, resulting in a decline in performance. - **Insufficient optimization of weak modalities**: The gradients of weak modalities are ignored and their potential cannot be fully exploited. - **Limitations of joint learning**: The joint learning strategy has the problem of performance compromise when dealing with multimodal data. ### Method overview 1. **Alternate update**: Only update the parameters of one modality at a time and keep the parameters of other modalities unchanged. 2. **Harmonic regularization**: By introducing the KL divergence as a harmonic term, the diversity between the current update and the historical model is maximized. 3. **Memory consolidation**: Through the memory consolidation regularization term, ensure that the learner of each modality will not pay too much attention to errors and maintain the memory of modality - specific patterns. 4. **Global correction**: After each update, allow the learner parameters of all modalities to be globally corrected to avoid falling into local optima. ### Experimental results - **Performance improvement**: The performance of ReconBoost on multiple multimodal benchmark datasets is significantly better than that of existing methods. - **Modal - specific encoder evaluation**: The encoders trained by ReconBoost have achieved significant performance improvements in all modalities, especially in the visual modality. - **Modal competition analysis**: Through t - SNE visualization, ReconBoost can effectively reduce modal competition and improve the quality of features. In conclusion, by proposing the ReconBoost method, this paper successfully solves the problem of modal competition in multimodal learning and provides a new and effective solution for multimodal learning.