FindingEmo: An Image Dataset for Emotion Recognition in the Wild

Laurent Mertens,Elahe' Yargholi,Hans Op de Beeck,Jan Van den Stock,Joost Vennekens
2024-06-05
Abstract:We introduce FindingEmo, a new image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address a key issue in the field of emotion recognition, namely identifying the emotional states of multiple individuals in natural scenes. Specifically, the paper introduces a new image dataset called FindingEmo, which contains annotations for over 25,000 images, optimized specifically for emotion recognition tasks. Unlike existing datasets, FindingEmo focuses on multi-person interactions in complex scenes, with the entire image annotated rather than just focusing on individual faces or persons. The dataset includes annotations in three main dimensions: Valence, Arousal, and Emotion Labels, and these annotations were collected via the crowdsourcing platform Prolific. The paper also explores the performance of different deep learning models on emotion classification and valence and arousal regression tasks based on this dataset. The study found that traditional CNN architectures like VGG16 outperform modern ViT models in emotion classification tasks, but ViT models perform better in valence and arousal prediction tasks. Additionally, the paper attempts to enhance performance by integrating features from different models, and the results indicate that incorporating facial expression features can bring some improvement, though the effect remains limited. Overall, the main goal of the paper is to provide a dataset that better reflects the complex scenarios of the real world to advance emotion recognition technology. It is hoped that this dataset can be utilized by both AI researchers and psychologists to further promote research in emotion recognition and social cognition.