Abstract:Sixth-generation (6G) wireless systems, when ultimately deployed, will comprise intelligent wireless networks that provide high-accuracy localization services together with ubiquitous communication. By bringing in a fresh set of traits and functionalities that allow location and communication to coexist while sharing resources, they provide the impetus for this change. By identifying the critical technological enablers that open up exciting new possibilities for combined localization and sensing applications, we concentrate on converged 6G communication, localisation, and sensing systems. 6G will advance toward even higher frequency ranges, broader bandwidths, and massive antenna arrays in terms of potential enabling technologies. Owing to the drawbacks of LiDAR, including its high price, short lifespan, and large volume, visual sensors—inexpensive and lightweight—are garnering increased interest and developing into a hotspot for study. With the rapid advancements in deep learning (DL) and hardware computing capacity, new approaches and concepts for solving visual simultaneous localization and mapping (VSLAM) difficulties have surfaced. We concentrate on the visual odometry (VO) application of DL and VSLAM integration. Most VO algorithms used today, such as those for motion estimation, feature extraction, feature matching, local optimization, etc., are created using subpar pipelines. Using Convolution LSTM, a unique end-to-end design for monocular VO is presented in this research. It does not adopt any module in the traditional VO pipeline, instead inferring postures directly from a series of raw RGB photos (videos) because it has been trained and deployed end-to-end. It uses CNN to automatically train an adequate representation of features for the VO problem based on the Convolution LSTM, which is utilized to simulate sequential dynamics and relations implicitly. Comprehensive tests on the KITTI VO dataset demonstrate competitive performance compared to cutting-edge techniques. This confirms that the end-to-end DL approach can be a viable addition to conventional VO systems.

MAIM-VO: A Robust Visual Odometry with Mixed MLP for Weak Textured Environment

Leveraging Local Planar Motion Property for Robust Visual Matching and Localization.

Self-supervised Visual-LiDAR Odometry with Flip Consistency

MAIM: a Mixer MLP Architecture for Image Matching

MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping With a Dynamic and Static Object Discriminator

RMVD: Robust Monocular VSLAM for Moving Robot in Dynamic Environment.

Visual-LiDAR SLAM Based on Unsupervised Multi-channel Deep Neural Networks

A real-time, robust and versatile visual-SLAM framework based on deep learning networks

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

SLAM Visual Localization and Location Recognition Technology Based on 6G Network

VSLAM based on deep learning in low-textured scenes

A Monocular Visual SLAM System Augmented by Lightweight Deep Local Feature Extractor Using In-House and Low-Cost LIDAR-camera Integrated Device

DVI-SLAM: A Dual Visual Inertial SLAM Network

MN-SLAM: Multi-networks Visual SLAM for Dynamic and Complicated Environments

Semantic visual simultaneous localization and mapping (SLAM) using deep learning for dynamic scenes

VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks

A Robust Deep Learning Enhanced Monocular SLAM System for Dynamic Environments

InertialNet: Toward Robust SLAM Via Visual Inertial Measurement.

Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry

DXSLAM: A Robust and Efficient Visual SLAM System with Deep Features.

DeepAVO: Efficient Pose Refining with Feature Distilling for Deep Visual Odometry