Multilingual Image Description with Neural Sequence Models

Desmond Elliott,Stella Frank,Eva Hasler
DOI: https://doi.org/10.48550/arXiv.1510.04709
2015-11-19
Abstract:In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline.
Computation and Language,Computer Vision and Pattern Recognition,Machine Learning,Neural and Evolutionary Computing
What problem does this paper attempt to address?