Abstract:Human pose estimation is a critical component in autonomous driving and parking, enhancing safety by predicting human actions. Traditional frame-based cameras and videos are commonly applied, yet, they become less reliable in scenarios under high dynamic range or heavy motion blur. In contrast, event cameras offer a robust solution for navigating these challenging contexts. Predominant methodologies incorporate event cameras into learning frameworks by accumulating events into event frames. However, such methods tend to marginalize the intrinsic asynchronous and high temporal resolution characteristics of events. This disregard leads to a loss in essential temporal dimension data, crucial for safety-critical tasks associated with dynamic human activities. To address this issue and to unlock the 3D potential of event information, we introduce two 3D event representations: the Rasterized Event Point Cloud (RasEPC) and the Decoupled Event Voxel (DEV). The RasEPC collates events within concise temporal slices at identical positions, preserving 3D attributes with statistical cues and markedly mitigating memory and computational demands. Meanwhile, the DEV representation discretizes events into voxels and projects them across three orthogonal planes, utilizing decoupled event attention to retrieve 3D cues from the 2D planes. Furthermore, we develop and release EV-3DPW, a synthetic event-based dataset crafted to facilitate training and quantitative analysis in outdoor scenes. On the public real-world DHP19 dataset, our event point cloud technique excels in real-time mobile predictions, while the decoupled event voxel method achieves the highest accuracy. Experiments reveal our proposed 3D representation methods' superior generalization capacities against traditional RGB images and event frame techniques. Our code and dataset are available at https://github.com/MasterHow/EventPointPose.

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization

ZeroGS: Training 3D Gaussian Splatting from Unposed Images

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

HGSLoc: 3DGS-based Heuristic Camera Pose Refinement

GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera

CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM

GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting

TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving

MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting

GauLoc: 3D Gaussian Splatting‐based Camera Relocalization

GSLoc: Visual Localization with 3D Gaussian Splatting

Accurate Dynamic SLAM Using CRF-Based Long-Term Consistency

CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field