Adversarial Learning-Based Automatic Evaluator for Image Captioning

Qianxia Ma,Duo Wang,Pinjie Li,Jingyan Song,Tao Zhang
DOI: https://doi.org/10.1109/cac53003.2021.9728653
2021-01-01
Abstract:Image captioning imitates the process of humans describing the visual world using natural language. Evaluating image captioning remains a challenging task. Current methods mainly focus on the similarity between the generated description and reference texts, despite the direct relevance of the caption and the corresponding image. In this paper, we propose an adversarial learning-based evaluator. The evaluator is designed on Conditional Generative Adversarial Networks and trained with Proximal Policy Optimization. Since it directly works on the relationship between images and descriptions, our model takes advantage of references of other similar images in the dataset. It gives more reliable judgment than state-of-the-art methods, taking variability of natural language into consideration for better evaluation.
What problem does this paper attempt to address?