Learning Grimaces by Watching TV

Samuel Albanie,Andrea Vedaldi
DOI: https://doi.org/10.48550/arXiv.1610.02255
2016-10-07
Abstract:Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at <a class="link-external link-http" href="http://www.robots.ox.ac.uk/~vgg/data/facevalue" rel="external noopener nofollow">this http URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?