Abstract:Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model. Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips. The code is released at: <a class="link-external link-https" href="https://github.com/Sense-GVT/Fast-BEV" rel="external noopener nofollow">this https URL</a>.

HFT: Lifting Perspective Representations Via Hybrid Feature Transformation for BEV Perception.

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

FB-BEV: BEV Representation from Forward-Backward View Transformations

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

BEV-CFKT: A LiDAR-camera cross-modality-interaction fusion and knowledge transfer framework with transformer for BEV 3D object detection

FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems

Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation

WidthFormer: Toward Efficient Transformer-based BEV View Transformation

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving