Attention-based hand pose estimation with voting and dual modalities
Dinh-Cuong Hoang,Anh-Nhat Nguyen,Thu-Uyen Nguyen,Ngoc-Anh Hoang,Van-Duc Vu,Duy-Quang Vu,Phuc-Quan Ngo,Khanh-Toan Phan,Duc-Thanh Tran,Van-Thiep Nguyen,Quang-Tri Duong,Ngoc-Trung Ho,Cong-Trinh Tran,Van-Hiep Duong,Anh-Truong Mai
DOI: https://doi.org/10.1016/j.engappai.2024.109526
IF: 8
2024-11-11
Engineering Applications of Artificial Intelligence
Abstract:Hand pose estimation has recently emerged as a compelling topic in the robotic research community, because of its usefulness in learning from human demonstration or safe human–robot interaction. Although deep learning-based methods have been introduced for this task and have shown promise, it remains a challenging problem. To address this, we propose a novel end-to-end architecture for hand pose estimation using red-green-blue (RGB) and depth (D) data (RGB-D). Our approach processes the two data sources separately and utilizes a dense fusion network with an attention module to extract discriminative features. The features extracted include both spatial information and geometric constraints, which are fused to vote for the hand pose. We demonstrate that our voting mechanism in conjunction with the attention mechanism is particularly useful for solving the problem, especially when hands are heavily occluded by objects or are self-occluded. Our experimental results on benchmark datasets demonstrate that our approach outperforms state-of-the-art methods by a significant margin.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary