SCEP—A New Image Dimensional Emotion Recognition Model Based on Spatial and Channel-Wise Attention Mechanisms

Bo Li,Hui Ren,Xuekun Jiang,Fang Miao,Feng Feng,Libiao Jin
DOI: https://doi.org/10.1109/access.2021.3057373
IF: 3.9
2021-01-01
IEEE Access
Abstract:Images are an important carrier for emotional expression. Human can understand emotions in image easily and quickly, whereas it is a very challenging task for machines to extract accurate emotions. In this study, we propose a novel spatial and channel-wise attention-based emotion prediction model, SCEP, to assist computers in recognizing the emotions of images more accurately. SCEP integrates both spatial attention and channel-wise weight mechanisms into a classical convolutional neural network (CNN) layer structure to predict image emotions, on the grounds that the spatial attention mechanism can enhance the contrast between salient regions and potentially irrelevant regions, and that the channel-wise weight mechanism can emphasize informative features while suppressing less useful features. The SCEP model outputs emotion values in a continuous 2-D valence and arousal space, so that more emotions can be expressed than by simply discretely classifying emotions. To validate the effectiveness of our model, we use an existing image dataset with a widespread emotion distribution for testing. Extensive experiments show that when compared to base models (i.e. VGG and ResNet) without spatial attention or channel-wise mechanisms, SCEP can improve the accuracy of emotion prediction (evaluated by concordance correlation coefficient) by ~ 3%-5% in the arousal domain, and by ~ 3-6% in the valence domain. Therefore, we conclude that using SCEP can bring higher accuracy in emotion prediction.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?