BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment

Mehdi Hosseinzadeh,Ian Reid

2024-10-28

Abstract:In the field of autonomous driving and mobile robotics, there has been a significant shift in the methods used to create Bird's Eye View (BEV) representations. This shift is characterised by using transformers and learning to fuse measurements from disparate vision sensors, mainly lidar and cameras, into a 2D planar ground-based representation. However, these learning-based methods for creating such maps often rely heavily on extensive annotated data, presenting notable challenges, particularly in diverse or non-urban environments where large-scale datasets are scarce. In this work, we present BEVPose, a framework that integrates BEV representations from camera and lidar data, using sensor pose as a guiding supervisory signal. This method notably reduces the dependence on costly annotated data. By leveraging pose information, we align and fuse multi-modal sensory inputs, facilitating the learning of latent BEV embeddings that capture both geometric and semantic aspects of the environment. Our pretraining approach demonstrates promising performance in BEV map segmentation tasks, outperforming fully-supervised state-of-the-art methods, while necessitating only a minimal amount of annotated data. This development not only confronts the challenge of data efficiency in BEV representation learning but also broadens the potential for such techniques in a variety of domains, including off-road and indoor environments.

Robotics,Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to solve the problem of relying on labeled data when creating Bird - Eye - View (BEV) representations in the fields of autonomous driving and mobile robotics. Specifically, existing learning methods highly depend on a large amount of labeled data when generating BEV maps, which is especially obvious in diverse or non - urban environments because these areas lack large - scale high - quality data sets. For this reason, the paper proposes the BEVPose framework. By using sensor poses as supervision signals and integrating data from cameras and lidars, it significantly reduces the dependence on expensive labeled data. In this way, BEVPose not only improves data efficiency but also outperforms fully - supervised methods in BEV map segmentation tasks while requiring only a small amount of labeled data. This progress not only addresses the data - efficiency challenges in BEV representation learning but also expands the application potential of such technologies in various fields, such as unpaved roads and indoor environments.

BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

Understanding Bird's-Eye View of Road Semantics using an Onboard Camera

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Improving Bird’s Eye View Semantic Segmentation by Task Decomposition

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

U-BEV: Height-aware Bird's-Eye-View Segmentation and Neural Map-based Relocalization

CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion

DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences