SO-Net: Model-Agnostic Sequential Hand Pose Optimization Framework

Yuanyuan Gao,Pengfei Ren,Mingen Shu,Rui Chu,Jubiao Li,Jing Jin,Wei Li
DOI: https://doi.org/10.1109/icassp48485.2024.10445741
2024-01-01
Abstract:Hand Pose Estimation (HPE) is a crucial technique for human-computer interaction perception. Recent works have shown that leveraging temporal information yields significant importance in the stability of the HPE system. However, existing sequential optimization methods are mainly designed for specific frameworks, limiting their applicability and generalization potential. To solve these problems, we propose a model-agnostic Sequential hand pose Optimization Network (SO-Net), which can be applicable to various HPE methods. Specifically, SO-Net first utilizes a feature-heterogeneous pre-embedding module that unifies multiple types of features in a coherent manner and facilitates their effective interactions. Then it adopts a transformer-based spatial-temporal network to capture the long-range information across frames. Finally, to enhance the robustness of the model, a dense refinement process is employed for further optimization. Our framework outperforms state-of-the-art methods on NYU and DexYCB datasets and also provides optimization advantages across various frameworks.
What problem does this paper attempt to address?