InfPose: Real-Time Infrared Multi-Human Pose Estimation for Edge Devices Based on Encoder-Decoder CNN Architecture

Xin Xu,Xinchao Wei,Yuelei Xu,Zhaoxiang Zhang,Kun Gong,Huafeng Li,Leibing Xiao
DOI: https://doi.org/10.1109/lra.2023.3303070
IF: 5.2
2024-01-01
IEEE Robotics and Automation Letters
Abstract:Despite its remarkable performance, RGB-based Multi-human Pose Estimation (MPE) technology has many practical limitations, such as nighttime and smoggy environments. Infrared imaging is a valid substitution in these scenarios but needs an efficient and fast method for MPE. This letter aims to design an infrared MPE model based on the Encoder-Decoder CNN architecture, InfPose, which can perform real-time on edge devices. We first built a lightweight Encoder-Decoder CNN backbone based on hardware-friendly inverted residual blocks. Secondly, we utilized three methods to improve the capability of InfPose, including decoupling associative embedding head, multi-scale supervision, and cross-modal knowledge distillation. In addition, we gathered a wild infrared human pose dataset to train and evaluate our methods. Experiment results show that the proposed model is more robust and has less latency when inference on edge GPU platforms in comparison to the prevailing mainstream models. The inference time for InfPose on Xavier NX was recorded as 27.7 ms (approximate to 37 fps) and maintained sufficient accuracy for use. This research can be applied to human-machine interaction of autonomous vehicles or intelligent robots in nighttime or other scenes with poor visibility.
What problem does this paper attempt to address?