Learning Universal Multi-View Age Estimator Using Video Context
Zheng Song,Bingbing Ni,Dong Guo,Terence Sim,Shuicheng Yan
DOI: https://doi.org/10.1109/iccv.2011.6126248
2011-01-01
Abstract:Many existing techniques for analyzing face images assume that the faces are at nearly frontal. Generalizing to non-frontal faces is often difficult, due to a dearth of ground truth for non-frontal faces and also to the inherent challenges in handling pose variations. In this work, we investigate how to learn a universal multi-view age estimator by harnessing 1) unlabeled web videos, 2) a publicly available labeled frontal face corpus, and 3) zero or more non-frontal faces with age labels. First, a large diverse human-involved video corpus is collected from online video sharing website. Then, multi-view face detection and tracking are performed to build a large set of frontal-vs-profile face bundles, each of which is from the same tracking sequence, and thus exhibiting the same age. These unlabeled face bundles constitute the so-called video context, and the parametric multi-view age estimator is trained by 1) enforcing the face-to-age relation for the partially labeled faces, 2) imposing the consistency of the predicted ages for the non-frontal and frontal faces within each face bundle, and 3) mutually constraining the multi-view age models with the spatial correspondence priors derived from the face bundles. Our multi-view age estimator performs well on a realistic evaluation dataset that contains faces under varying poses, and whose ground truth age was manually annotated.