DADAN: dual-path attention with distribution analysis network for text-image matching

Wenhao Li,Hongqing Zhu,Suyi Yang,Han Zhang
DOI: https://doi.org/10.1007/s11760-021-02020-2
2022-01-17
Abstract:Bidirectional visual-text retrieval task has aroused interest of many researchers in the field of computer vision. In this paper, an end-to-end trainable model inserted with a proposed dual-path attention with distribution analysis network is established to minimize misalignment caused by irrelevant matching. This architecture is effective in terms of split of path by the distribution analysis such that targeted attention mechanisms can be designed to capture truly contributing text-region pairs. In specific, the proposed row-wise attention and column-wise attention accomplish relative similarity analysis in both query modality and retrieval modality simultaneously. In each retrieval direction, the significance of relevance could be comprehensively justified along with latent alignment inference. Meanwhile, this method not only filters irrelevant retrieval current studies that mainly aim at, but also provides more reasonable order of retrieval results. Experimental results on public benchmarks illustrate noticeable improvement on text-image matching, especially for text retrieval direction.
What problem does this paper attempt to address?