Image Captioning with Partially Rewarded Imitation Learning.

Xintong Yu,Tszhang Guo,Kun Fu,Lei Li,Changshui Zhang,Jianwei Zhang
DOI: https://doi.org/10.1109/ijcnn.2019.8851721
2019-01-01
Abstract:Current state-of-the-art image captioning algorithms have achieved great progress via reinforcement learning or generative adversarial nets, with hand-craft metrics such as CIDEr as the reward for the former and signals from adversarial discriminative networks for the latter. Despite the high scores on metrics or improvement in diversity gained from the application of these methods, they suffer from distinction with human-written sentences and drop of ratings on metrics respectively. In this paper, we propose a novel training objective for image captioning that consists of two parts representing explicit and implicit knowledge respectively. Optimizing the new reward partially with imitation learning, we devise an algorithm in which the caption generator is trained to maximize the combination of CIDEr and predictions from adversarial discriminator. Experiments on MSCOCO dataset demonstrate that the proposed method can integrate the strengths of state-of-the-arts, producing more human-like captions while maintaining comparable performance on traditional metrics.
What problem does this paper attempt to address?