Abstract:Accurate localization in challenging garage environments -- marked by poor lighting, sparse textures, repetitive structures, dynamic scenes, and the absence of GPS -- is crucial for automated valet parking (AVP) tasks. Addressing these challenges, our research introduces AVM-SLAM, a cutting-edge semantic visual SLAM architecture with multi-sensor fusion in a bird's eye view (BEV). This novel framework synergizes the capabilities of four fisheye cameras, wheel encoders, and an inertial measurement unit (IMU) to construct a robust SLAM system. Unique to our approach is the implementation of a flare removal technique within the BEV imagery, significantly enhancing road marking detection and semantic feature extraction by convolutional neural networks for superior mapping and localization. Our work also pioneers a semantic pre-qualification (SPQ) module, designed to adeptly handle the challenges posed by environments with repetitive textures, thereby enhancing loop detection and system robustness. To demonstrate the effectiveness and resilience of AVM-SLAM, we have released a specialized multi-sensor and high-resolution dataset of an underground garage, accessible at <a class="link-external link-https" href="https://yale-cv.github.io/avm-slam_dataset" rel="external noopener nofollow">this https URL</a>, encouraging further exploration and validation of our approach within similar settings.

What problem does this paper attempt to address?

This paper attempts to solve the problems of accurate mapping and localization when performing the Automated Valet Parking (AVP) task in challenging environments such as underground garages. Specifically, these environments usually have the following characteristics: 1. **Poor lighting conditions**: Underground garages are usually poorly lit, making it difficult for visual sensors to capture clear images. 2. **Sparse texture**: The interior of the garage lacks rich texture features, making traditional texture - based SLAM methods difficult to work effectively. 3. **Repeated structures**: There are a large number of similar structures and markings in the garage, which are prone to cause mismatches and affect the robustness and accuracy of the system. 4. **Dynamic scenes**: Vehicles and other objects in the garage move frequently, increasing the difficulty of localization. 5. **Lack of GPS signals**: Underground garages usually do not have GPS signals and cannot rely on the global positioning system for auxiliary positioning. To solve these problems, the author proposes a novel semantic visual SLAM architecture named AVM - SLAM, whose main features include: - **Multi - sensor fusion**: It combines four fisheye cameras, wheel - speed encoders and inertial measurement units (IMU) to improve the stability and accuracy of the system. - **Bird - eye - view (BEV) perspective**: By generating a bird - eye - view through the Around View Monitor (AVM) subsystem, the perception range and robustness of the system are enhanced. - **Halo removal technique**: The halo removal technique is applied in BEV images for the first time, which significantly improves the road - marking detection ability and the effect of semantic segmentation. - **Semantic pre - screening (SPQ) module**: An SPQ mechanism is designed to deal with the challenges in repeated - texture environments and improve the success rate of loop - closure detection and the overall performance of the system. In addition, in order to verify the effectiveness and robustness of the AVM - SLAM system, the author also releases an underground garage data set containing high - resolution multi - sensor data. This data set can be used for further exploration and verification of SLAM methods in similar environments. In summary, this paper aims to improve the mapping and localization accuracy of the automated valet parking task in complex underground garage environments by introducing innovative technologies and methods.

AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird's Eye View for Automated Valet Parking

Semantic closed-loop based visual mapping algorithm for automated valet parking

Towards Autonomous Indoor Parking: A Globally Consistent Semantic SLAM System and A Semantic Localization Subsystem

Visual Semantic Landmark-Based Robust Mapping and Localization for Autonomous Indoor Parking

Multi-Camera Visual-Inertial Simultaneous Localization and Mapping for Autonomous Valet Parking

A Multisensor Fusion With Automatic Vision–LiDAR Calibration Based on Factor Graph Joint Optimization for SLAM

Accurate Visual Simultaneous Localization and Mapping (SLAM) against Around View Monitor (AVM) Distortion Error Using Weighted Generalized Iterative Closest Point (GICP)

DV-LOAM: Direct Visual LiDAR Odometry and Mapping

W-VSLAM: A Visual Mapping Algorithm for Indoor Inspection Robots

A Survey of Visual SLAM in Dynamic Environment: The Evolution From Geometric to Semantic Approaches

DRV-SLAM: An Adaptive Real-Time Semantic Visual SLAM Based on Instance Segmentation Toward Dynamic Environments

SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization

VPL-SLAM: A Vertical Line Supported Point Line Monocular SLAM System

Sensor Fusion SLAM: An Efficient and Robust SLAM system for Dynamic Environments

HVL-SLAM: Hybrid Vision and LiDAR Fusion for SLAM

A real-time, robust and versatile visual-SLAM framework based on deep learning networks

Dynamic indoor mapping for AVP: Crowdsourcing mapping without prior maps

LF-VISLAM: A SLAM Framework for Large Field-of-View Cameras with Negative Imaging Plane on Mobile Agents

SGC-VSLAM: A Semantic and Geometric Constraints VSLAM for Dynamic Indoor Environments

DVI-SLAM: A Dual Visual Inertial SLAM Network

BASL-AD SLAM: A Robust Deep-Learning Feature-Based Visual SLAM System With Adaptive Motion Model