Pose-Guided Robust Action Recognition for Outdoor Internet of Things
Jiahui Yu,Xu Cheng,Hang Chen,Yingke Xu
DOI: https://doi.org/10.1109/TCE.2024.3384974
2024-01-01
IEEE Transactions on Consumer Electronics
Abstract:Skeleton-based human recognition is a key technology for visual feedback, which can help the Internet of Things (IoT) interact with humans in a non-contact manner outdoors. Graph Convolutional Networks (GCNs), achieving an intuitive understanding of the human skeleton, have received increasing attention. Although current GCN-based works explore how to model unlinked body parts, they still show weak robustness in noisy solid data, such as joint/frame loss, which often happens in outdoor IoTs. Towards robust visual feedback, in the paper, we propose robust skeleton-based action recognition neural networks (Robust-SAR), a new cost-efficient approach for recognizing activities in noisy outdoor scenarios. Instead of estimating 2D or 3D skeleton coordinates, we first extract the heatmap of human poses from videos. We propose S-pose to learn heatmap at multiple levels, i.e., joint, joint-scale, and part-scale learning, boosting higher-order motion pattern learning. Additionally, we propose T-pose to adaptively employ the features of previous frames to enhance the current frame, further enhancing the robustness of spatiotemporal human representation. Experimentally, Robust-SAR achieves state-of-the-art recognition results on four benchmarks, including NTU-60, NTU-120, NUCLA, and Kinetics-400 (outdoor datasets). Furthermore, in noise-filled outdoor conditions, the performance of Robust-SAR only drops by about 0.5%, while other state-of-the-art methods drop by about 2%.