Abstract:Human pose estimation is a critical component in autonomous driving and parking, enhancing safety by predicting human actions. Traditional frame-based cameras and videos are commonly applied, yet, they become less reliable in scenarios under high dynamic range or heavy motion blur. In contrast, event cameras offer a robust solution for navigating these challenging contexts. Predominant methodologies incorporate event cameras into learning frameworks by accumulating events into event frames. However, such methods tend to marginalize the intrinsic asynchronous and high temporal resolution characteristics of events. This disregard leads to a loss in essential temporal dimension data, crucial for safety-critical tasks associated with dynamic human activities. To address this issue and to unlock the 3D potential of event information, we introduce two 3D event representations: the Rasterized Event Point Cloud (RasEPC) and the Decoupled Event Voxel (DEV). The RasEPC collates events within concise temporal slices at identical positions, preserving 3D attributes with statistical cues and markedly mitigating memory and computational demands. Meanwhile, the DEV representation discretizes events into voxels and projects them across three orthogonal planes, utilizing decoupled event attention to retrieve 3D cues from the 2D planes. Furthermore, we develop and release EV-3DPW, a synthetic event-based dataset crafted to facilitate training and quantitative analysis in outdoor scenes. On the public real-world DHP19 dataset, our event point cloud technique excels in real-time mobile predictions, while the decoupled event voxel method achieves the highest accuracy. Experiments reveal our proposed 3D representation methods' superior generalization capacities against traditional RGB images and event frame techniques. Our code and dataset are available at https://github.com/MasterHow/EventPointPose.

End-to-End 6dof Pose Estimation from Monocular RGB Images

6D-Vnet: End-To-End 6dof Vehicle Pose Estimation from Monocular RGB Images

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Vehicle Global 6-Dof Pose Estimation under Traffic Surveillance Camera

Depth-aware Imbalance Learning for Monocular 6dof Vehicle Pose Estimation

SilhoNet: An RGB Method for 6D Object Pose Estimation

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

W6DNet: Weakly Supervised Domain Adaptation for Monocular Vehicle 6-D Pose Estimation With 3-D Priors and Synthetic Data

6DoF Pose Estimation of Transparent Object from a Single RGB-D Image

6D pose estimation of 3D objects in scenes with mutual similarities and occlusions

6-DoF grasp estimation method that fuses RGB-D data based on external attention

PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features

RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images

Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation.

Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision

PVNet: Pixel-Wise Voting Network for 6dof Object Pose Estimation.