CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Han Guwen,Ye Qi,Chen Anjun,Chen Jiming
DOI: https://doi.org/10.1109/icra57147.2024.10610469
2024-01-01
Abstract:Interactive hand mesh reconstruction from singleview images poses a significant challenge with the severe occlusion and depth ambiguity inherent in interactive hand gestures. Recent approaches that employ probabilistic models and tokenpruned techniques have shown decent results in multi-view human body reconstruction. Nevertheless, these methods have not fully utilized multi-scale semantic information from multiview images and are not applicable in scenarios involving severe occlusion during dual-hand interactions. Simultaneously, current single-view methods independently reconstruct the left and right hands, which are ineffective in enhancing the interaction between both hands. To address these challenges, we propose CAMInterHand, a cooperative attention-based method for multi-view interactive hand pose and mesh reconstruction. Specifically, CAMInterHand extracts local pyramid features and global vertex features from multi-scale feature maps of multi-view images, enabling the exploration of rich local semantic information and facilitating effective feature alignment. Furthermore, CAMInterHand employs the cooperative attention fusion module to fuse all features from multi-view images, enhancing interactions among vertices of dual hands within global and local contexts. We conduct extensive experiments on the large-scale multi-view dataset InterHand2.6M and CAMInterHand achieves a substantial performance improvement over existing methods for multi-view and single-view interactive hand reconstruction.
What problem does this paper attempt to address?