Learning Visual Emotion Distributions via Multi-Modal Features Fusion.

Sicheng Zhao,Guiguang Ding,Yue Gao,Jungong Han
DOI: https://doi.org/10.1145/3123266.3130858
2017-01-01
Abstract:Current image emotion recognition works mainly classified the images into one dominant emotion category, or regressed the images with average dimension values by assuming that the emotions perceived among different viewers highly accord with each other. However, due to the influence of various personal and situational factors, such as culture background and social interactions, different viewers may react totally different from the emotional perspective to the same image. In this paper, we propose to formulate the image emotion recognition task as a probability distribution learning problem. Motivated by the fact that image emotions can be conveyed through different visual features, such as aesthetics and semantics, we present a novel framework by fusing multi-modal features to tackle this problem. In detail, weighted multi-modal conditional probability neural network (WMMCPNN) is designed as the learning model to associate the visual features with emotion probabilities. By jointly exploring the complementarity and learning the optimal combination coefficients of different modality features, WMMCPNN could effectively utilize the representation ability of each uni-modal feature. We conduct extensive experiments on three publicly available benchmarks and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approaches for emotion distribution prediction.
What problem does this paper attempt to address?