Intra- and Inter-Modal Curriculum for Multimodal Learning

Yuwei Zhou,Xin Wang,Hong Chen,Xuguang Duan,Wenwu Zhu
DOI: https://doi.org/10.1145/3581783.3612468
2023-01-01
Abstract:Multimodal learning has been widely studied and applied due to its improvement over previous unimodal tasks and its effectiveness on emerging multimodal challenges. However, it has been reported that modal encoders are under-optimized in multimodal learning in contrast to unimodal learning, especially when some modalities are dominant over others. Existing solutions to this problem suffer from two limitations: i) they merely focus on inter-modal balance, failing to consider the influence of intra-modal data on each modality; ii) their implementations heavily rely on unimodal performances or losses, thus being suboptimal for the tasks requiring modal interactions (e.g., visual question answering). To tackle these limitations, we propose I2MCL, a generic Intra- and Inter-Modal Curriculum Learning framework which simultaneously considers both data difficulty and modality balance for multimodal learning. In the intra-modal curriculum, we adopt a pretrained teacher model to obtain knowledge distillation loss as the difficulty measurer, which determines the data weights within the corresponding modality. In the inter-modal curriculum, we utilize a Pareto optimization strategy to measure and compare the gradients from distillation loss and task loss across modalities, capable of determining whether a modality should learn from the task or its teacher. Empirical experiments on various tasks including multimodal classification, visual question answering and visual entailment demonstrate that our proposed I2MCL is able to tackle the under-optimized modality problem and bring consistent improvement to multimodal learning.
What problem does this paper attempt to address?