RetinaHand: Towards Accurate Single-Stage Hand Pose Estimation

Zilong Xiao,Luojun Lin,Yuanxi Yang,Yuanlong Yu
DOI: https://doi.org/10.1007/978-3-031-20738-9_73
2023-01-01
Abstract:Due to the high joint flexibility and deformation degree of hands, hand pose estimation is more challenging in the detection task. In order to ensure the accuracy of prediction, two-stage algorithms are proposed recently, which requires a huge and redundant model structure and is difficult to implement end-to-end deployment. In this paper, we propose a novel dynamic single-stage CNN (RetinaHand) for end-to-end 2D handpose estimation of RGB images based on RetinaNet. RetinaHand firstly extracts image features through the backbone with dynamic convolutional layers. In the neck module, we propose Context Path Aggregation Network (CPANet) that fuse different scale features and expands context information to improve performance. In addition, we use the idea of multi-task learning to add a keypoints heatmap regression branch on the basis of the existing classification and bounding box regression branch, and use multi-task loss training model. Experimental results on the Eric.Lee and Panoptic datasets consistently show that our proposed RetinaHand has comparable performance to existing hand pose estimation methods at more efficient inference rates.
What problem does this paper attempt to address?