Slot based Image Captioning with WGAN

Ziyu Xue,Lei Wang,Peiyu Guo
DOI: https://doi.org/10.1109/icis46139.2019.8940218
2019-06-01
Abstract:Existing image captioning methods are always limited to the rules of words or syntax with single sentence and poor words. In this paper, this paper introduces a novel framework for image captioning tasks which reconciles slot filling approaches with neural network approaches. Our approach first generates a sentence template with many slot locations using Wasserstein Generative Adversarial Network (WGAN). Then the slots which are in visual regions will be filled by object detectors. Our model consists of a structured sentence generator and a multi-level sentence discriminator. Extensive experiments are conducted on three benchmark datasets, (i.e., Microsoft COCO, Flickr8k and Flickr30k), and experimental results on standard image captioning and novel object captioning tasks clearly corroborate the efficacy of our method.
Computer Science
What problem does this paper attempt to address?