Self-Attention based Fine-Grained Cross-Media Hybrid Network

Wei Shan,Dan Huang,Jiangtao Wang,Feng Zou,Suwen Li
DOI: https://doi.org/10.1016/j.patcog.2022.108748
IF: 8
2022-04-01
Pattern Recognition
Abstract:Due to the heterogeneity gap, the data representations of different types of media are inconsistent. It is challenging to measure the fine-grained gap between different media. To this end, we propose a self-attention-based hybrid network to learn the common representations of different media data. Specifically, we first utilize a local self-attention layer to learn the common attention space between different media data. Then we propose a similarity concatenation method to understand the content relationship between features. To further improve the robustness of the model, we also learn a local position encoding to capture the spatial relationships between features. Therefore, our proposed approach can effectively reduce the gap between different feature distributions on cross-media retrieval tasks. Extensive experiments and ablation studies demonstrate that our proposed method achieves state-of-the-art performance. The source code and models are publicly available at: https://github.com/NUST-Machine-Intelligence-Laboratory/SAFGCMHN.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?