Abstract:Bird's-Eye-View (BEV) perception has become a vital component of autonomous driving systems due to its ability to integrate multiple sensor inputs into a unified representation, enhancing performance in various downstream tasks. However, the computational demands of BEV models pose challenges for real-world deployment in vehicles with limited resources. To address these limitations, we propose QuadBEV, an efficient multitask perception framework that leverages the shared spatial and contextual information across four key tasks: 3D object detection, lane detection, map segmentation, and occupancy prediction. QuadBEV not only streamlines the integration of these tasks using a shared backbone and task-specific heads but also addresses common multitask learning challenges such as learning rate sensitivity and conflicting task objectives. Our framework reduces redundant computations, thereby enhancing system efficiency, making it particularly suited for embedded systems. We present comprehensive experiments that validate the effectiveness and robustness of QuadBEV, demonstrating its suitability for real-world applications.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the computational efficiency issues of multi-task perception frameworks in autonomous driving systems. Specifically, the paper proposes an efficient four-task perception framework named **QuadBEV**, which integrates four key tasks through Bird's Eye View (BEV) representation: 3D object detection, lane detection, map segmentation, and occupancy prediction. These issues include: 1. **Computational Resource Constraints**: Traditional BEV methods are computationally intensive, making it difficult to deploy them on vehicles with limited computational resources. 2. **Challenges of Multi-task Learning**: - **Learning Rate Sensitivity**: Different tasks respond differently to the same learning rate, and the optimal learning rate for one task may affect the performance of another task. - **Task Objective Conflicts**: Each task may need to emphasize different feature aspects, leading to conflicts during the training process. ### Solution To address the above challenges, the paper proposes the **QuadBEV** framework, whose main features include: 1. **Multi-task Architecture**: Integrates the four key tasks into a unified framework by sharing a backbone network and task-specific heads. 2. **Progressive Training Strategy**: - **Feature Extractor Pre-training**: Pre-train the feature extractor using the map segmentation task. - **Multi-task Warm-up Training**: Freeze the parameters of the feature extraction layers and progressively train all task-specific heads, balancing tasks by adjusting learning rates and loss weights. - **End-to-end Training**: Eliminate the distinction between primary and auxiliary tasks, using a gradient-weighted algorithm to dynamically adjust loss weights to ensure balance between tasks. 3. **Experimental Validation**: Extensively validate the effectiveness and robustness of **QuadBEV** through experiments, demonstrating its potential application in real-world autonomous driving scenarios. ### Main Contributions 1. **Multi-task Architecture**: Proposes a framework that comprehensively handles four key tasks in autonomous driving. 2. **Progressive Training Strategy**: Designs a phased learning rate adjustment and gradient-based loss balancing technique to achieve balanced learning between tasks. 3. **Experimental Validation**: Validates the effectiveness and robustness of **QuadBEV** through extensive experiments, proving its potential in practical applications. ### Conclusion The **QuadBEV** framework not only improves the computational efficiency of multi-task perception but also maintains high performance, making it particularly suitable for real-time processing needs in embedded systems. Compared to traditional methods, **QuadBEV** significantly enhances computational efficiency and processing speed while maintaining performance comparable to existing state-of-the-art methods.

QuadBEV: An Efficient Quadruple-Task Perception Framework via Bird's-Eye-View Representation

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion

Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

M-BEV: Masked BEV Perception for Robust Autonomous Driving

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Improved Single Camera BEV Perception Using Multi-Camera Training