Cross-modal retrieval based on fusion lightweight network.

Ying Liu,Lu Chao,Yingying Guo,Jie Fang,Weidong Zhang
DOI: https://doi.org/10.1145/3573942.3574053
2022-01-01
Abstract:With the increase in data diversity, how to solve the multiple semantic problems under modality becomes the key to cross-modal retrieval. Most of the existing cross-modal methods often accompany many parameters and large models, and they have low operation efficiency. In this paper, we propose a cross-modal retrieval method based on fusion lightweight networks to address the aforementioned problem. Our goal is to extract the features of different modalities separately through two lightweight networks. Additionally, we propose a fusion strategy for local and global features to better preserve local fine-grained information. Finally, we use the generative adversarial training to preserve the contextual semantic information between different modalities and improve the generalization ability of the model. Experiments show that the algorithm in this paper achieves excellent cross-modal retrieval performance on the MSCOCO dataset.
What problem does this paper attempt to address?