Through-Wall Human Pose Estimation by Mutual Information Maximizing Deeply-Supervised Nets

Zhijie Zheng,Jun Pan,Diankun Zhang,Xiao Liang,Xiaojun Liu,Guangyou Fang
DOI: https://doi.org/10.1109/jiot.2023.3294955
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:This article proposes a 3-D human pose estimation method using through-wall radar (TWR) systems, which extends and supplements new applications in the era of the Internet of Things (IoT). TWR system can penetrate nonmetallic obstacles and perceive wall-occlusive human targets, but the physical characteristics of radio frequency (RF) signals, such as poor imaging resolution and specularity effect, make the pose estimation process highly ill-posed. In this work, we propose a mutual information maximizing deeply supervised network (MIMDSN), which aims to extract accurate and robust 3-D human skeletons from TWR images. Inspired by past works, an optical system is attached to the TWR system to provide cross-modal pseudo labels. Based on a depth design philosophy of convolutional neural networks that meets radar resolution constraints, we design a resolution-guided pose estimation network for keypoint coordinate regression. To alleviate the ill-posed problem, supervising solely the network output is insufficient. The cross-modal supervision is not only built on predictions, but also on features of the network’s hidden layer. With the help of information theory, the mutual information between features and pseudo labels is maximized for feature alignment and discriminability enhancement. Experiments show competitive performance against state-of-the-art RF-based human pose estimation methods and can reconstruct accurate 3-D skeletons in multitarget, low-visibility, and wall-occlusive scenes.
What problem does this paper attempt to address?