Reasoning Guided by a Manual: Context-Aware Image Captioning with Novel Objects

Peiyao Hua,Haifeng Sun,Jiachang Hao,Cong Liu,Jingyu Wang,Qi,Jianxin Liao
DOI: https://doi.org/10.3233/faia230381
2023-01-01
Abstract:Novel object captioning task aims at describing objects that are absent from training data. Due to the scarcity of novel objects, it’s challenging to find a way to utilize external data to improve model’s reasoning ability. While previously designed methods all follow a deep learning approach, we boost novel object captioning by incorporating reasoning with traditional deep learning framework. We design a manual from dictionaries that provides our model with sufficient and accurate external information on novel objects. We propose Manual-guided Context-aware Novel Object Captioning model (MC-NOC) that utilizes image and caption context to generate novel object captions. It contains a Manual-Guided Novel Object Reasoning module to reason about novel objects based on other objects of the given image and a Caption Reconstruction module to incorporate novel objects into generated captions according to caption context. We validate MC-NOC with state-of-the-art performance on the challenging Held-out COCO and Nocaps dataset, leading their leaderboard. In particular, we improved the CIDER metric by 6.4 points on the held-out coco dataset. Comprehensive experiments demonstrate our model’s reasoning capability and the quality of generated captions.
What problem does this paper attempt to address?