Semantic Relation-aware Difference Representation Learning for Change Captioning

Yunbin Tu,Tingting Yao,Liang Li,Jiedong Lou,Shengxiang Gao,Zhengtao Yu,Chenggang Yan
DOI: https://doi.org/10.18653/v1/2021.findings-acl.6
2021-01-01
Abstract:Change captioning is to describe the difference in a pair of images with a natural language sentence. In this task, the distractors, such as the illumination or viewpoint change, bring the huge challenges about learning the difference representation. In this paper, we propose a semantic relation-aware difference representation learning network to explicitly learn the difference representation in the existence of distractors. Specifically, we introduce a self-semantic relation embedding block to explore the underlying changed objects and design a cross-semantic relation measuring block to localize the real change and learn the discriminative difference representation. Besides, relying on the POS of words, we devise an attention-based visual switch to dynamically use visual information for caption generation. Extensive experiments show that our method achieves the state-of-the-art performances on CLEVR-Change and Spot-the-Diff datasets (1).
What problem does this paper attempt to address?