Implicit Media Tagging and Affect Prediction from video of spontaneous facial expressions, recorded with depth camera

Daniel Hadar
DOI: https://doi.org/10.48550/arXiv.1701.05248
2017-01-19
Abstract:We present a method that automatically evaluates emotional response from spontaneous facial activity recorded by a depth camera. The automatic evaluation of emotional response, or affect, is a fascinating challenge with many applications, including human-computer interaction, media tagging and human affect prediction. Our approach in addressing this problem is based on the inferred activity of facial muscles over time, as captured by a depth camera recording an individual's facial activity. Our contribution is two-fold: First, we constructed a database of publicly available short video clips, which elicit a strong emotional response in a consistent manner across different individuals. Each video was tagged by its characteristic emotional response along 4 scales: \emph{Valence, Arousal, Likability} and \emph{Rewatch} (the desire to watch again). The second contribution is a two-step prediction method, based on learning, which was trained and tested using this database of tagged video clips. Our method was able to successfully predict the aforementioned 4 dimensional representation of affect, as well as to identify the period of strongest emotional response in the viewing recordings, in a method that is blind to the video clip being watch, revealing a significantly high agreement between the recordings of independent viewers.
Human-Computer Interaction
What problem does this paper attempt to address?