Enhancing Skeletal Pose Estimation from Mmwave Point Clouds Through Uncertainty Reduction

Hsin-Che Chiang,Guan-Hua Li,Fan Wang,Shervin Shirmohammadi,Cheng-Hsin Hsu
DOI: https://doi.org/10.1145/3688865.3689479
2024-01-01
Abstract:Human Pose Estimation is vital for a variety of applications, including surveillance, sports, and healthcare. Millimeter-wave (mmWave) radar technology provides significant advantages over traditional vision-based and wearable sensors, such as enhanced privacy and reduced intrusiveness. However, mmWave point clouds pose challenges due to aleatoric uncertainty from inherent noise. This study aims to enhance the mmWave-based Skeletal Pose Estimator (SPE) by reducing uncertainty. We propose a series of SPE models: (i) CSPE+: a CNN-based SPE model that uses features from nearby frames to reduce the uncertainty of SPE, which only uses a single frame, (ii) TSPE+: an enhancement of CSPE+ by replacing the CNN with the Multiscale Vision Transformer (MViT) for better temporal modeling, and (iii) CSPE++/TSPE++: further refined models using a two-stage training process with Heteroscedastic Loss in the second stage. Evaluations on two different datasets, food intake activities, and driver activities, showed significant improvements in our proposed models in pose estimation accuracy. For the Food Intake Activity Dataset, where the baseline SPE had a Mean Per Joint Position Error (MPJPE) of 64.63 mm, CSPE+ reduced the error by 56.41%, TSPE+ by 64.23%, CSPE++ by 62.22%, and TSPE++ performed best with a 68.51% reduction, achieving an MPJPE of 20.35 mm. Similarly, for the Driver Activity Dataset, where the baseline SPE had an MPJPE of 123.04 mm, CSPE+ reduced the error by 60.47%, TSPE+ by 74.82%, CSPE++ by 75.19%, and TSPE++ performed best with a 78.47% reduction, achieving an MPJPE of 26.49 mm. These results demonstrate the effectiveness of our proposed models across heterogeneous mmWave radar datasets.
What problem does this paper attempt to address?