End-to-End Deep Memory Network for Visual-Textual Sentiment Analysis

Hang Miao,Ruifang Liu,Sheng Gao,Xin Zhou,Xiaoxin He
DOI: https://doi.org/10.1109/icnidc.2018.8525751
2018-01-01
Abstract:We propose an end-to-end multimodal deep memory network to integrate image and text information for visual-textual sentiment analysis. We focus on the vital regions of an image based on the corresponding text representation. The image regions are treated as memory cells stored in the memory and attention mechanism is leveraged to retrieve relevant visual regions. For feature fusion, we utilize Convolutional Neural Network (CNN) to combine visual and textual information, which is more efficient in learning the joint representation. Experimental results on IMDB dataset demonstrate the effectiveness of our approach.
What problem does this paper attempt to address?