Exploiting Temporal Correlations for 3D Human Pose Estimation

Ruibin Wang,Xianghua Ying,Bowei Xing
DOI: https://doi.org/10.1109/tmm.2023.3323874
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Exploiting the rich temporal information in human pose sequences to facilitate 3D pose estimation has garnered particular attention. While various learning architectures have been designed for temporal exploiting, these architectures are usually trained via the 3D pose loss independently imposed on every single frame, without explicit temporal signals introduced for supervision. This inevitably increases the difficulty of temporal exploiting, since the network must reason about the meaningful temporal information based on the non-temporal single-frame supervision first. Only then, the network can utilize this information to guide sequence modeling. Recently, some work introduce temporal smoothness as an explicit supervision signal, which makes the network more straightforwardly reaches the temporal information from the supervision signal, thus improving the temporal exploiting. However, the temporal smoothness only roughly measures the short-term temporal properties between adjacent frame pairs. In this work, we propose to generalize the supervision of temporal smoothness to temporal correlations, letting the network precisely consider more comprehensive temporal properties in sequences. We contribute two novel correlation-based loss functions, which adopt different strategies to respectively regularize the encoder and decoder sides of the network for temporal exploiting. Besides, we design a pre-training scheme to ensure a general convergence of existing pose estimators under our correlation losses. Experiments on three benchmarks demonstrate that our method can be compatible with different networks, improving their temporal exploiting ability to output more accurate and robust pose estimations.
What problem does this paper attempt to address?