Progressive Visual Content Understanding Network for Image Emotion Classification

Jicai Pan,Shangfei Wang
DOI: https://doi.org/10.1145/3581783.3612186
2023-01-01
Abstract:Most existing methods for image emotion classification extract features directly from images supervised by a single emotional label. However, this approach has a limitation known as the affective gap which restricts the capability of these features as they do not always align with the emotions perceived by users. To effectively bridge the affective gap, this paper proposes a visual content understanding network inspired by the human staged emotion perception process. The proposed network is comprised of three perception modules designed to extract multi-level information. Firstly, an entity perception module extracts entities from images. Secondly, an attribute perception module extracts the attribute content of each entity. Thirdly, an emotion perception module extracts emotion features based on both the entity and attribute information. We generate pseudo-labels of entities and attributes through image segmentation and vision-language models to provide auxiliary guidance for network learning. The progressive entity and attribute understanding enable the network to hierarchically extract semantic-level features for emotion analysis. Extensive experiments demonstrate that our progressive learning network achieves superior performance on various benchmark datasets for image emotion classification.
What problem does this paper attempt to address?