Proprioception-Driven Wearer Pose Estimation for Egocentric Video

Wei Su,Yuehu Liu,Shasha Li,Zerun Cai
DOI: https://doi.org/10.1109/icpr56361.2022.9956092
2022-01-01
Abstract:Perceiving proprioception from egocentric video to estimate 3D wearer pose is an attention-grabbing visual task. Yet the invisibility of wearer body and the complex motion modality bring challenges to perceive self-motion from the human visual span. In this work, a data processing framework is designed to convert a raw egocentric video stream into a 3D wearer pose sequence. Critically, a generic and lightweight Self-Perception Excitation (SPE) module is proposed to enhance motion modeling and calibrate spatial correlation in the temporal dimension. Employing ResNet50 embedded with SPE module as a backbone, a two-stream architecture for proprioception representation pipeline is proposed to learn proprioception behaviors from RGB streams and motion streams. Then, the proprioception is incorporated as an additional key control signal in a deep reinforcement learning (DeepRL) based motion imitation policy for estimating the multi-modal wearer pose. By considering proprioception, we indicate for the first time, it is possible to understand the self from an egocentric view and further translate it into a higher understanding of wearer motion. The experimental results demonstrate that the proposed framework is able to outperform the state-of-the-art methods by a large margin on the MoCup dataset and produce highly identifiable proprioception behaviors.
What problem does this paper attempt to address?