I3N: Intra- and Inter-representation Interaction Network for Change Captioning

Shengbin Yue,Yunbin Tu,Liang Li,Ying Yang,Shengxiang Gao,Zhengtao Yu
DOI: https://doi.org/10.1109/tmm.2023.3242142
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Change captioning aims to describe the disagreement of image pairs with a linguistic sentence. Compared with single image captioning, change captioning requires not only understanding the fine-grained information of each image, but also determining whether change occurs and further representing the differences of image pairs. Although much progress has been made, it remains a severe challenge of the precise difference representation in the distraction of viewpoint change, especially that of tiny difference. In this paper, we propose a novel Intra- and Inter-representation Interaction Network (I3N) to learn the fine difference representation and be immune to viewpoint change. In the Intra-representation Interaction stage, we design Geometry-Semantic Interaction Refining (GSIR) to explore the positional and semantic interactions of intra-image, which can be a prior knowledge of enduring viewpoint change and reinforce the cognition of semantic change. In the Inter-representation Interaction stage, to endow the model with the capability of pinpointing the latent difference in viewpoint change, Hierarchical Representation Interaction (HRI) models difference from coarse to fine representations through the Semantic Matcher and Change Amplifier module. The proposed approach outperforms the state-of-the-art methods with an encouraging performance on the existing change captioning benchmarks. Our code is available at https://github.com/yueshengbin/I3N.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?