Abstract:Multi-camera perception methods in Bird's-Eye-View (BEV) have gained wide application in autonomous driving. However, due to the differences between roadside and vehicle-side scenarios, there currently lacks a multi-camera BEV solution in roadside. This paper systematically analyzes the key challenges in multi-camera BEV perception for roadside scenarios compared to vehicle-side. These challenges include the diversity in camera poses, the uncertainty in Camera numbers, the sparsity in perception regions, and the ambiguity in orientation angles. In response, we introduce RopeBEV, the first dense multi-camera BEV approach. RopeBEV introduces BEV augmentation to address the training balance issues caused by diverse camera poses. By incorporating CamMask and ROIMask (Region of Interest Mask), it supports variable camera numbers and sparse perception, respectively. Finally, camera rotation embedding is utilized to resolve orientation ambiguity. Our method ranks 1st on the real-world highway dataset RoScenes and demonstrates its practical value on a private urban dataset that covers more than 50 intersections and 600 cameras.

What problem does this paper attempt to address?

The paper attempts to address the problem of achieving multi-camera Bird’s-Eye-View (BEV) perception in roadside scenarios. Specifically, the authors point out that compared to in-vehicle environments, multi-camera BEV perception in roadside environments faces the following four main challenges: 1. **Diversity of Camera Poses**: In in-vehicle environments, the relative positions of cameras are fixed; however, in roadside environments, the camera setups vary greatly at different locations, leading to diverse camera poses and unbalanced training of feature extractors. 2. **Uncertainty in the Number of Cameras**: The number of cameras in in-vehicle environments is usually fixed, whereas in roadside environments, the number of cameras varies depending on the geographical location. 3. **Sparsity of Perception Areas**: Roadside cameras are installed at higher altitudes with a wider field of view, resulting in a large number of areas in the images that do not contain obstacles, causing resource wastage. 4. **Ambiguity in Direction Angles**: In in-vehicle systems, the BEV coordinate system is centered on the vehicle, while roadside scenarios use a non-self-centered coordinate system, leading to ambiguity in the direction angles of objects. To address these issues, the authors propose the RopeBEV method, which includes the following improvements: - **BEV Data Augmentation**: Balances training data by randomly translating and rotating the BEV coordinate system, ensuring that each feature extractor is fully trained. - **CamMask Mechanism**: Supports an arbitrary number of camera inputs, allowing the network to exclude features from certain cameras during training. - **ROIMask Mechanism**: Filters out irrelevant perception areas, improving computational efficiency and perception accuracy. - **Camera Rotation Embedding**: Introduces camera rotation angles as embedded information to resolve the ambiguity in direction angles. Experimental results show that RopeBEV performs excellently on the real-world highway dataset RoScenes and demonstrates its industrial application potential on a large-scale private city dataset containing over 50 intersections and 600 cameras.

RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View

Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task

CalibRBEV: Multi-Camera Calibration Via Reversed Bird's-eye-view Representations for Autonomous Driving

The Relationship Between a Urinary Cachectic Factor and Weight Loss in Advanced Cancer Patients

CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

M-BEV: Masked BEV Perception for Robust Autonomous Driving

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

Improved Single Camera BEV Perception Using Multi-Camera Training

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

Monocular BEV Perception of Road Scenes Via Front-to-Top View Projection

Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems