GuessWhich? Visual Dialog with Attentive Memory Network.

Lei Zhao,Xinyu Lyu,Jingkuan Song,Lianli Gao
DOI: https://doi.org/10.1016/j.patcog.2021.107823
IF: 8
2021-01-01
Pattern Recognition
Abstract:•We use memory network in the cooperative ‘GuessWhich’ game between Q-BOT and A-BOT. It reduces the repetition of the generated dialogs and makes image retrieval efficient.•We propose a novel Attentive Memory Network that adds a fusion model to the memory network. The fusion model can effectively use the manually labeled caption and the image. Thus the generated dialogs and the predicted image representation can be visually grounded.•Experiments conducted on VisDial 1.0 datasets demonstrate that our generated dialogs are natural and precise, and the results exceed the state-of-the-art ‘GuessWhich’ based visual dialog algorithms. Extensive image retrieval experiments prove that our method also can generate more accurate results compared to the benchmarks.
What problem does this paper attempt to address?