Abstract:3D hand pose estimation is a crucial subject in the domain of computer vision. Recently researchers transform a single depth image into multiple virtual view depth images. By projecting a single depth image through point cloud transformation and using the depth images of multiple virtual views together for hand pose estimation, these methods can effectively improve the estimation accuracy. However, current methods have issues with distorted generated depth images, insufficient usage of the depth image of each view, and high computational overhead. To overcome these problems, we introduce a multi-virtual view scoring network (MVSN). Our proposed MVSN consists of a single virtual view estimation module, virtual view feature encoding module, and virtual view scoring module. To generate an intermediate feature map suitable for virtual view scoring, the single virtual view estimation module uses a feature map offset loss function and enhance information interaction between channels in the backbone network. The virtual view feature encoding module adopts a two-branch structure to capture information about all joints and single joints from the intermediate feature map, respectively. This structure effectively improves model sensitivity to each view, better integrates information from each virtual view, and obtains a more appropriate scoring feature for each virtual view. The virtual view scoring module scores each view based on the scoring feature, and gives a higher score to the more accurately estimated virtual view. We also propose a dynamic virtual view removal strategy to remove poor quality views in the training process. Our model is tested on the NYU and ICVL datasets, and the mean joint error is 6.21 mm and 4.53 mm, respectively, exhibiting better estimation accuracy than existing methods.

1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024

CAMInterHand: Cooperative Attention for Multi-View Interactive Hand Pose and Mesh Reconstruction

Multi-virtual View Scoring Network for 3D Hand Pose Estimation from a Single Depth Image

3D Hand Pose Estimation in Everyday Egocentric Images

Efficient Virtual View Selection for 3D Hand Pose Estimation

A Survey on 3D Hand Pose Estimation: Cameras, Methods, and Datasets

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

Joint Hand Detection and Rotation Estimation by Using CNN

Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild.

A 3D Hand Attitude Estimation Method for Fixed Hand Posture Based on Dual-View RGB Images

Recurrent 3D Hand Pose Estimation Using Cascaded Pose-Guided 3D Alignments

PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction

NETWORKS EFFECTIVELY UTILIZING 2D SPATIAL INFORMATION FOR ACCURATE 3D HAND POSE ESTIMATION

Depth-Based 3D Hand Pose Estimation: from Current Achievements to Future Goals

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

Cross-View Person Identification Based on Confidence-Weighted Human Pose Matching

Cross-View Person Identification by Matching Human Poses Estimated with Confidence on Each Body Joint

Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS

The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge

Multi-View Matching (MVM): Facilitating Multi-Person 3D Pose Estimation Learning with Action-Frozen People Video