EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video

Jennifer J. Sun,Ting Liu,Alan S. Cowen,Florian Schroff,Hartwig Adam,Gautam Prasad
DOI: https://doi.org/10.48550/arXiv.2001.05488
2021-02-23
Abstract:Videos can evoke a range of affective responses in viewers. The ability to predict evoked affect from a video, before viewers watch the video, can help in content creation and video recommendation. We introduce the Evoked Expressions from Videos (EEV) dataset, a large-scale dataset for studying viewer responses to videos. Each video is annotated at 6 Hz with 15 continuous evoked expression labels, corresponding to the facial expression of viewers who reacted to the video. We use an expression recognition model within our data collection framework to achieve scalability. In total, there are 36.7 million annotations of viewer facial reactions to 23,574 videos (1,700 hours). We use a publicly available video corpus to obtain a diverse set of video content. We establish baseline performance on the EEV dataset using an existing multimodal recurrent model. Transfer learning experiments show an improvement in performance on the LIRIS-ACCEDE video dataset when pre-trained on EEV. We hope that the size and diversity of the EEV dataset will encourage further explorations in video understanding and affective computing. A subset of EEV is released at <a class="link-external link-https" href="https://github.com/google-research-datasets/eev" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?