Abstract:This paper presents a novel 3D object detection algorithm designed for Bird's Eye View (BEV) scenarios, which significantly improves detection capabilities by integrating spatial and temporal features. The core of our approach is the spatial-temporal alignment module that efficiently processes information across different time steps and spatial locations, enhancing the precision and robustness of object detection. We employ a temporal self-attention mechanism to capture the motion information of objects over time, allowing the model to correlate features across various time steps for identifying and tracking moving objects. Additionally, a spatial cross-attention mechanism is utilized to focus on spatial features within regions of interest, promoting interactions between features extracted from camera views and BEV queries. Our method also implements temporal feature integration and multi-scale feature fusion to enhance detection stability and accuracy for fast-moving objects and to capture multi-scale context information, respectively. The model employs an enriched feature set post alignment for 3D bounding box prediction, ascertaining the position, dimensions, and orientation of objects. We conducted experiments on two public datasets for autonomous driving nuScenes and Waymo Open Dataset, demonstrating that our method outperforms previous BEVFormer and other state-of-the-art methods in terms of detection accuracy and robustness. The paper concludes with potential future directions for optimizing the BEVFormer model's performance and exploring its application in broader scenarios and tasks.

Cross-Modal 3D Object Detection and Tracking for Auto-Driving

Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving

A Multi-view 3D Vehicle Detection Method Based On Novel 3D Proposal Generation Method

3D Dynamic Multi-target Detection Algorithm Based on Cross-view Feature Fusion

3D Multi-Object Tracking Based on Radar-Camera Fusion

Enhancing 3D object detection through multi-modal fusion for cooperative perception

CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

3D Multiple Object Tracking with Multi-modal Fusion of Low-cost Sensors for Autonomous Driving.

3D Object Detection for Point Cloud in Virtual Driving Environment

Enhanced 3D object detection for autonomous driving: A spatial-temporal alignment approach in Bird's Eye View scenarios

Joint Multi-Object Detection and Tracking with Camera-LiDAR Fusion for Autonomous Driving

Influence of Camera-LiDAR Configuration on 3D Object Detection for Autonomous Driving

Lightweight Map-Enhanced 3D Object Detection and Tracking for Autonomous Driving.

Dynamic Object Tracking for Self-Driving Cars Using Monocular Camera and LIDAR.

Multi-View 3D Object Detection Network for Autonomous Driving

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Joint Monocular 3D Vehicle Detection and Tracking

Tracking Objects with 3D Representation from Videos