Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Yangguang Li,Bin Huang,Zeren Chen,Yufeng Cui,Feng Liang,Mingzhu Shen,Fenggang Liu,Enze Xie,Lu Sheng,Wanli Ouyang,Jing Shao
2024-07-10
Abstract:Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model. Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips. The code is released at: <a class="link-external link-https" href="https://github.com/Sense-GVT/Fast-BEV" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of achieving fast, efficient, and high-performance Bird's Eye View (BEV) perception in autonomous vehicles (AVs). Most existing BEV solutions either require substantial resources for onboard inference or deliver mediocre performance. Therefore, the authors propose a simple yet effective framework—Fast-BEV, aimed at enabling faster BEV perception on vehicle-mounted chips. Specifically, the paper addresses the following key issues: 1. **Balance between resource consumption and performance**: Many existing BEV methods, while showing some performance, require expensive hardware support or complex model structures, limiting their application in economical vehicles. Fast-BEV achieves high performance while maintaining low resource consumption by optimizing model structure and algorithm design. 2. **Fast onboard inference**: Fast-BEV specifically designs lightweight view transformation methods (Fast-Ray transformation) and other components, allowing the model to run quickly on vehicle-mounted chips, meeting real-time requirements. 3. **Deployment friendliness**: Fast-BEV is not only competitive in performance but also more friendly in actual deployment, suitable for running on various vehicle-mounted chips such as Xavier, Orin, Tesla T4, etc. 4. **Multi-task processing capability**: Fast-BEV can handle single or multiple tasks, such as 3D detection, segmentation, etc., with high flexibility and efficiency. Through these improvements, Fast-BEV not only performs well in academic benchmarks but also provides practical solutions in industrial applications, offering new ideas for the deployment of future BEV perception systems on low-computing-resource platforms.