M-BEV: Masked BEV Perception for Robust Autonomous Driving

Siran Chen,Yue Ma,Yu Qiao,Yali Wang

2023-12-19

Abstract:3D perception is a critical problem in autonomous driving. Recently, the Bird-Eye-View (BEV) approach has attracted extensive attention, due to low-cost deployment and desirable vision detection capacity. However, the existing models ignore a realistic scenario during the driving procedure, i.e., one or more view cameras may be failed, which largely deteriorates the performance. To tackle this problem, we propose a generic Masked BEV (M-BEV) perception framework, which can effectively improve robustness to this challenging scenario, by random masking and reconstructing camera views in the end-to-end training. More specifically, we develop a novel Masked View Reconstruction (MVR) module for M-BEV. It mimics various missing cases by randomly masking features of different camera views, then leverages the original features of these views as self-supervision, and reconstructs the masked ones with the distinct spatio-temporal context across views. Via such a plug-and-play MVR, our M-BEV is capable of learning the missing views from the resting ones, and thus well generalized for robust view recovery and accurate perception in the testing. We perform extensive experiments on the popular NuScenes benchmark, where our framework can significantly boost 3D perception performance of the state-of-the-art models on various missing view cases, e.g., for the absence of back view, our M-BEV promotes the PETRv2 model with 10.3% mAP gain.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily addresses a key challenge in 3D perception for autonomous driving: the significant performance drop of existing Bird's-Eye-View (BEV) methods when one or more cameras fail. Specifically: 1. **Importance of 3D Perception**: - 3D perception is one of the critical technologies in autonomous driving, enabling accurate identification of the size, position, and other information of objects in the surrounding environment. 2. **Issues with Existing Methods**: - Existing BEV methods typically assume that all six cameras are functioning properly. However, in real driving scenarios, one or more cameras may fail, leading to a severe decline in system performance. 3. **Proposed Solution**: - The authors propose a perception framework called Masked BEV (M-BEV), which enhances the model's robustness in the event of camera failures by randomly masking and reconstructing camera views. Specifically, they design a self-supervised Masked View Reconstruction (MVR) module to simulate camera failures during training and reconstruct the missing information using views from other functioning cameras. 4. **Experimental Validation**: - Extensive experiments on the popular NuScenes dataset demonstrate that the M-BEV framework can significantly improve the 3D perception performance of existing models under various camera failure scenarios. For example, in the case of a missing rear-view camera, M-BEV can improve the mAP of the PETRv2 model by 10.3%. In summary, the paper aims to enhance the robustness and accuracy of autonomous driving systems in the event of camera failures by introducing the M-BEV framework and its MVR module.

M-BEV: Masked BEV Perception for Robust Autonomous Driving

MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Improved Single Camera BEV Perception Using Multi-Camera Training

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird's-Eye-View in Dynamic Scenarios

BEVHeight++: Toward Robust Visual Centric 3D Object Detection

UniM^2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation