Bidirectional difference locating and semantic consistency reasoning for change captioning

Yaoqi Sun,Liang Li,Tingting Yao,Tongyv Lu,Bolun Zheng,Chenggang Yan,Hua Zhang,Yongjun Bao,Guiguang Ding,Gregory Slabaugh
DOI: https://doi.org/10.1002/int.22821
IF: 8.993
2022-01-19
International Journal of Intelligent Systems
Abstract:Change captioning is an emerging task to describe the changes between a pair of images. The difficulty in this task is to discover the differences between the two images. Recently, some methods have been proposed to address this problem. However, they all employ unidirectional difference localization to identify the changes. This can lead to ambiguity about the nature of the changes. Instead, we propose a framework with bidirectional difference localization and semantic consistency reasoning to describe the image changes. First, we locate the changes in the two images by capturing bidirectional differences. Then we design a decoder with spatial‐channel attention to generate the change caption. Finally, we introduce semantic consistency reasoning to constrain our bidirectional difference localization module and spatial‐channel attention module. Extensive experiments on three public data sets show that the performance of our proposed model outperforms the state‐of‐the‐art change captioning models by a large margin.
computer science, artificial intelligence
What problem does this paper attempt to address?