A Multi-task Interaction Mechanism for 3D Hand Pose Estimation from RGB Image

Jingyi Sun,Shaoxiang Guo,Junyu Dong,Shu Zhang
DOI: https://doi.org/10.1109/swc57546.2023.10448989
2023-01-01
Abstract:With the development of computer vision and the increasing market demand, various hand pose estimation algorithms have emerged. However, there are still some obstacles and difficulties that need to be addressed. Many previous works focused on solving a single task independently, and different tasks were treated as separate entities. When these tasks are performed simultaneously to serve a common goal, this approach does not fully exploit the information of the feature map and the its impact on tasks. To address this challenge and further improve the performance and connection of the different tasks, we presented a Multi-task Interaction Mechanism framework to estimate 3D hand pose, which consists of 2 main parts. The first stage is a Multi-task Learning module, where three different tasks are selected to obtain semantic features. In the second stage, we focus on the interaction between multiple tasks and propose a contrastive learning-based module. We design a novel Semantic Aware Correction (SAC) module to complete feature interaction, which enables enhanced interaction among three different tasks compared to other methods. We only use one layer of Transformer. Compared to other tasks of the same type, it can use fewer layers to achieve the same accuracy. Experiments demonstrate that our approach is capable of estimating 3D hand pose in challenging situations.
What problem does this paper attempt to address?