Convnets-Based Action Recognition From Depth Maps Through Virtual Cameras And Pseudocoloring

Pichao Wang,Wanqing Li,Zhimin Gao,Chang Tang,Jing Zhang,Philip Ogunbona
DOI: https://doi.org/10.1145/2733373.2806296
2015-01-01
Abstract:In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the color-coded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the stat-of- the-art results on these datasets.
What problem does this paper attempt to address?