Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Cherie Ho,Jiaye Zou,Omar Alama,Sai Mitheran Jagadesh Kumar,Benjamin Chiang,Taneesh Gupta,Chen Wang,Nikhil Keetha,Katia Sycara,Sebastian Scherer

2024-07-12

Abstract:Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper mainly discusses how to use large-scale publicly available data to solve the general problem of Bird's Eye View (BEV) mapping. The current methods for predicting BEV maps from First-Person View (FPV) images have limited generalization ability due to the dataset based on autonomous vehicles. Researchers found that by combining FPV images from the two large-scale crowd-sourced map platforms, Mapillary, and BEV semantic maps from OpenStreetMap, a more scalable and generalized map prediction method can be achieved. To this end, they propose the MapItAnywhere (MIA) data engine, which seamlessly collects and models annotated mapping data from existing open-source map platforms. MIA enables the automatic collection of 1.2 million pairs of FPV images and BEV maps, covering various geographical, terrain, weather conditions, camera models, and captured scenes. By training a simple camera-agnostic model (Mapper) on the MIA data, the study shows its zero-shot performance in unseen cities and existing benchmark tests far exceeds state-of-the-art baselines trained on existing datasets. The paper also emphasizes the potential of using large-scale public maps for the development and testing of general BEV perception tasks, paving the way for more robust autonomous navigation. Through extensive evaluation, the MIA data engine demonstrates its effectiveness in improving the generalization ability of BEV map prediction during pre-training and reveals the challenges that still exist for future deployment of map prediction in both general and specific environments.

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Semi-Supervised Learning for Visual Bird's Eye View Semantic Segmentation

OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping

Predicting Maps Using In-Vehicle Cameras for Data-Driven Intelligent Transport

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images

BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment

BEV-SLAM: Building a Globally-Consistent World Map Using Monocular Vision

Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention

Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving

Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird's Eye View

Bird’s Eye View Map for End-to-end Autonomous Driving Using Reinforcement Learning

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

Understanding Bird's-Eye View of Road Semantics using an Onboard Camera

Learning from Maps: Visual Common Sense for Autonomous Driving

GAFB-Mapper: Ground Aware Forward-Backward View Transformation for Monocular BEV Semantic Mapping

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation

Probabilistic Semantic Mapping for Autonomous Driving in Urban Environments