MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis

Nan Xu,W. Mao
DOI: https://doi.org/10.1145/3132847.3133142
2017-11-06
Abstract:With the prevalence of more diverse and multiform user-generated content in social networking sites, multimodal sentiment analysis has become an increasingly important research topic in recent years. Previous work on multimodal sentiment analysis directly extracts feature representation of each modality and fuse these features for classification. Consequently, some detailed semantic information for sentiment analysis and the correlation between image and text have been ignored. In this paper, we propose a deep semantic network, namely MultiSentiNet, for multimodal sentiment analysis. We first identify object and scene as salient detectors to extract deep semantic features of images. We then propose a visual feature guided attention LSTM model to extract words that are important to understand the sentiment of whole tweet and aggregate the representation of those informative words with visual semantic features, object and scene. The experiments on two public available sentiment datasets verify the effectiveness of our MultiSentiNet model and show that our extracted semantic features demonstrate high correlations with human sentiments.
Computer Science
What problem does this paper attempt to address?