Abstract:Human pose estimation is a critical component in autonomous driving and parking, enhancing safety by predicting human actions. Traditional frame-based cameras and videos are commonly applied, yet, they become less reliable in scenarios under high dynamic range or heavy motion blur. In contrast, event cameras offer a robust solution for navigating these challenging contexts. Predominant methodologies incorporate event cameras into learning frameworks by accumulating events into event frames. However, such methods tend to marginalize the intrinsic asynchronous and high temporal resolution characteristics of events. This disregard leads to a loss in essential temporal dimension data, crucial for safety-critical tasks associated with dynamic human activities. To address this issue and to unlock the 3D potential of event information, we introduce two 3D event representations: the Rasterized Event Point Cloud (RasEPC) and the Decoupled Event Voxel (DEV). The RasEPC collates events within concise temporal slices at identical positions, preserving 3D attributes with statistical cues and markedly mitigating memory and computational demands. Meanwhile, the DEV representation discretizes events into voxels and projects them across three orthogonal planes, utilizing decoupled event attention to retrieve 3D cues from the 2D planes. Furthermore, we develop and release EV-3DPW, a synthetic event-based dataset crafted to facilitate training and quantitative analysis in outdoor scenes. On the public real-world DHP19 dataset, our event point cloud technique excels in real-time mobile predictions, while the decoupled event voxel method achieves the highest accuracy. Experiments reveal our proposed 3D representation methods' superior generalization capacities against traditional RGB images and event frame techniques. Our code and dataset are available at https://github.com/MasterHow/EventPointPose.

Proprioception-Driven Wearer Pose Estimation for Egocentric Video

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video

Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation

SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera

Embodied Scene-aware Human Pose Estimation

Ego-Body Pose Estimation via Ego-Head Pose Estimation

Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation

3D Human Pose Perception from Egocentric Stereo Videos

EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Scene-aware Egocentric 3D Human Pose Estimation

xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera

SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video

Social EgoMesh Estimation

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Mo2Cap2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

3D Human pose estimation from video via multi-scale multi-level spatial temporal features

Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs