Abstract:The bird's-eye-view (BEV) perception plays a critical role in autonomous driving systems, involving the accurate and efficient detection and tracking of objects from a top-down perspective. To achieve real-time decision-making in self-driving scenarios, low-latency computation is essential. While recent approaches to BEV detection have focused on improving detection precision using Lift-Splat-Shoot (LSS)-based or transformer-based schemas, the substantial computational and memory burden of these approaches increases the risk of system crashes when multiple on-vehicle tasks run simultaneously. Unfortunately, there is a dearth of literature on efficient BEV detector paradigms, let alone achieving realistic speedups. Unlike existing works that focus on reducing computation costs, this paper focuses on developing an efficient model design that prioritizes actual on-device latency. To achieve this goal, we propose a latency-aware design methodology that considers key hardware properties, such as memory access cost and degree of parallelism. Given the prevalence of GPUs as the main computation platform for autonomous driving systems, we develop a theoretical latency prediction model and introduce efficient building operators. By leveraging these operators and following an effective local-to-global visual modeling process, we propose a hardware-oriented backbone that is also optimized for strong feature capturing and fusing. Using these insights, we present a new hardware-oriented framework for efficient yet accurate camera-view BEV detectors. Experiments show that HotBEV achieves a 2\%$\sim$23\% NDS gain, and 2\%$\sim$7.8\% mAP gain with a 1.1$\times$$\sim$3.4$\times$ speedups compared to existing works on V100; On multiple GPU devices such as GPU GTX 2080 and the low-end GTX 1080, HotBEV achieves 1.1$\times$$\sim$6.3$\times$ faster than others.

An Acceleration Inference Implementation of BEVFusion with MQBench on Xavier

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

A 28nm 1.2GHz 5.27TOPS/W Scalable Vision/Point Cloud Deep Fusion Processor with CAM-based Universal Mapping Unit for BEVFusion Applications.

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

IRBEVF-Q: Optimization of Image-Radar Fusion Algorithm Based on Bird's Eye View Features

DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices

Towards Efficient Architecture and Algorithms for Sensor Fusion

Radar and Camera Fusion for Multi-Task Sensing in Autonomous Driving

UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection

HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception

BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework

Optimizing Monocular Driving Assistance for Real-Time Processing on Jetson AGX Xavier

MLFNet: Multi-Level Fusion Network for Real-Time Semantic Segmentation of Autonomous Driving

EcoFusion: Energy-Aware Adaptive Sensor Fusion for Efficient Autonomous Vehicle Perception

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection

Real-Time Hybrid Multi-Sensor Fusion Framework for Perception in Autonomous Vehicles