Abstract:In order to accurately identify occluding targets and infer the motion state of objects, we propose a Bird's-Eye View Object Detection Network based on Temporal-Spatial feature fusion (TS-BEV), which replaces the previous multi-frame sampling method by using the cyclic propagation mode of historical frame instance information. We design a new Temporal-Spatial feature fusion attention module, which fully integrates temporal information and spatial features, and improves the inference and training speed. In response to realize multi-frame feature fusion across multiple scales and views, we propose an efficient Temporal-Spatial deformable aggregation module, which performs feature sampling and weighted summation from multiple feature maps of historical frames and current frames, and makes full use of the parallel computing capabilities of GPUs and AI chips to further improve efficiency. Furthermore, in order to solve the lack of global inference in the context of temporal-spatial fusion BEV features and the inability of instance features distributed in different locations to fully interact, we further design the BEV self-attention mechanism module to perform global operation of features, enhance global inference ability and fully interact with instance features. We have carried out extensive experimental experiments on the challenging BEV object detection nuScenes dataset, quantitative results show that our method achieves excellent performance of 61.5% mAP and 68.5% NDS in camera-only 3D object detection tasks, and qualitative results show that TS-BEV can effectively solve the problem of 3D object detection in complex traffic background with lack of light at night, with good robustness and scalability.

A Streamlined Framework for Bev-Based 3d Object Detection with Prior Masking

Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

PreBEV: Leveraging Predictive Flow for Enhanced Bird's-Eye View 3D Dynamic Object Detection

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation

OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection

Enhanced 3D object detection for autonomous driving: A spatial-temporal alignment approach in Bird's Eye View scenarios

BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

Delving into the Secrets of BEV 3D Object Detection in Autonomous Driving: A Comprehensive Survey

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection

BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Enhance the 3D Object Detection With 2D Prior

TS-BEV: BEV object detection algorithm based on temporal-spatial feature fusion