Improvement of image description using bidirectional LSTM

Vahid Chahkandi,Mohammad Javad Fadaeieslam,Farzin Yaghmaee
DOI: https://doi.org/10.1007/s13735-018-0158-y
2018-07-19
International Journal of Multimedia Information Retrieval
Abstract:AbstractAs a high-level technique, automatic image description combines linguistic and visual information in order to extract an appropriate caption for an image. In this paper, we have proposed a method based on a recurrent neural network to synthesize descriptions in multimodal space. The innovation of this paper consists in generating sentences with variable length and novel structures. The Bi-LSTM network has been applied to achieve this purpose. This paper utilizes the inner product as common space, which reduces the computational cost and improves results. We have evaluated the performance of the proposed method on benchmark datasets: Flickr8K and Flickr30K. The results demonstrate that Bi-LSTM has better efficiency, as compared to the unidirectional model.
computer science, artificial intelligence, software engineering
What problem does this paper attempt to address?