Embrace Smaller Attention: Efficient Cross-Modal Matching with Dual Gated Attention Fusion

Weikuo Guo,Xiangwei Kong
DOI: https://doi.org/10.1109/icassp49357.2023.10096438
2023-01-01
Abstract:Cross-modal matching is one of the most fundamental and widely studied tasks in the field of data science. To have a better understanding of the complicated cross-modal correspondences, the powerful attention mechanism has been widely used recently. In this paper, we propose a novel Dual Gated Attention Fusion (DGAF) unit to save cross-modal matching from heavy attention computation. Specifically, the attention unit in the main information flow is alternated to a single-head low-dimension light-weighted attention bypass which serves as a gate to selectively cast away noise in both modality. To strengthen the interaction between modalities, an auxiliary memory unit is appended. A gated memory fusion unit is designed to fuse the memorized inter-modality information into both modality streams. Extensive experiments on two benchmark datasets show that the proposed DGAF achieves good balance between the efficiency and the effectiveness.
What problem does this paper attempt to address?