MUGGLE: MUlti-Stream Group Gaze Learning and Estimation.

Ning Zhuang,Bingbing Ni,Yi Xu,Xiaokang Yang,Wenjun Zhang,Zefan Li,Wen Gao
DOI: https://doi.org/10.1109/tcsvt.2019.2940479
IF: 5.859
2020-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Being able to accurately predict the common gaze point of a group of persons is of particular interest to precise marketing and automatic group attention assessment. Group gaze estimation faces challenges including small face/head size and outlier observers. To address these challenges, we proposed a novel framework called Multi-stream Group Gaze Learning and Estimation (MUGGLE). The MUGGLE infrastructure includes two inference streams: 1) a holistic stream which utilizes fused attention map as input to a global deep convolutional structure to explore the global geometric configurations and contexts of interesting persons in the scene; and 2) an aggregative stream which robustly aggregates individual gazes via a recurrent structure (e.g., LSTM) to obtain outlier-tolerant estimation. Both streams are seamlessly integrated via a fusion network. Extensive experiments are performed on a fully annotated group gaze image dataset with 8,000+ images and 100,000+ faces (which is publicly releasable). The results demonstrate the effectiveness of the proposed MUGGLE framework in group gaze estimation.
What problem does this paper attempt to address?