Multi-Modal Correction Network for Recommendation

Zengmao Wang,Yunzhen Feng,Xin Zhang,Renjie Yang,Bo Du
DOI: https://doi.org/10.1109/tkde.2024.3493374
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Multi-modal contents have proven to be the powerful knowledge for recommendation tasks. Most state-of-the-art multi-modal recommendation methods mainly focus on aligning the semantic spaces of different modalities to enhance the item representations and do not pay much attention on the relevant knowledge in the multi-modalities for recommendation, resulting in that the positive effects of the relevant knowledge is reduced and the improvement of recommendation performance is limited. In this paper, we propose a multi-modal correction network termed MMCN to enhance the item representation with the important semantic knowledge in each modality by a residual structure with attention mechanisms and a hierarchical contrastive learning framework. The residual information is obtained through self-attention and cross-attention, which can learn the relevant knowledge across different modalities effectively. While hierarchical contrastive learning further captures the relevant knowledge not only at the feature level but also at the element- wise level with a matrix. Extensive experiments on three large-scale real-world datasets show the superiority of MMCN over state-of-the-art multi-modal recommendation methods.
What problem does this paper attempt to address?