Towards Controllable Image Descriptions with Semi-Supervised VAE.

Nikolai Zakharov,Hang Su,Jun Zhu,Jan Glaescher
DOI: https://doi.org/10.1016/j.jvcir.2019.102574
IF: 2.887
2019-01-01
Journal of Visual Communication and Image Representation
Abstract:Image captioning models successfully describe the visual contents of images using natural language. To generate more natural and diverse descriptions, a model must learn style-specific patterns and requires collecting style-specific datasets, which is time-consuming. To address this issue, we propose a semi-supervised deep generative model, Semi-supervised Conditional Variational Auto-Encoder (SCVAE). Our model is capable of leveraging more labelled and unlabelled data in the generative model schema. Extensive empirical results demonstrate that compared with the start-of-art models, our proposed method is able to generate more accurate image captions with more extensive styles. (C) 2019 Elsevier Inc. All rights reserved.
What problem does this paper attempt to address?