Abstract:Monocular approaches to simultaneous localization and mapping (SLAM) have recently addressed with success the challenging problem of the fast computation of dense reconstructions from a single, moving camera. Thus, if these approaches initially relied on the detection of a reduced set of interest points to estimate the camera position and the map, they are currently able to reconstruct dense maps from a handheld camera while the camera coordinates are simultaneously computed. However, these maps of 3-dimensional points usually remain meaningless, that is, with no memorable items and without providing a way of encoding spatial relationships between objects and paths. In humans and mobile robotics, landmarks play a key role in the internalization of a spatial representation of an environment. They are memorable cues that can serve to define a region of the space or the location of other objects. In a topological representation of the space, landmarks can be identified and located according to its structural, perceptive or semantic significance and distinctiveness. But on the other hand, landmarks may be difficult to be located in a metric representation of the space. Restricted to the domain of visual landmarks, this work describes an approach where the map resulting from a point-based, monocular SLAM is annotated with the semantic information provided by a set of distinguished landmarks. Both features are obtained from the image. Hence, they can be linked by associating to each landmark all those point-based features that are superimposed to the landmark in a given image (key-frame). Visual landmarks will be obtained by means of an object-based, bottom-up attention mechanism, which will extract from the image a set of proto-objects. These proto-objects could not be always associated with natural objects, but they will typically constitute significant parts of these scene objects and can be appropriately annotated with semantic information. Moreover, they will be affine covariant regions, that is, they will be invariant to affine transformation, being detected under different viewing conditions (view-point angle, rotation, scale, etc.). Monocular SLAM will be solved using the accurate parallel tracking and mapping (PTAM) framework by Klein and Murray in Proceedings of IEEE/ACM international symposium on mixed and augmented reality, 2007.

Language meets YOLOv8 for metric monocular SLAM

YPD-SLAM: A Real-Time VSLAM System for Handling Dynamic Indoor Environments

Lp-slam: language-perceptive RGB-D SLAM framework exploiting large language model

YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint

Real-Time SLAM Based on Dynamic Feature Point Elimination in Dynamic Environment

MonoSLAM: Real-Time Single Camera SLAM

OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping

SLAM-OR: Simultaneous Localization, Mapping and Object Recognition Using Video Sensors Data in Open Environments from the Sparse Points Cloud

LP-SLAM: Language-Perceptive RGB-D SLAM system based on Large Language Model

A Novel Lidar-Assisted Monocular Visual SLAM Framework for Mobile Robots in Outdoor Environments

Language-EXtended Indoor SLAM (LEXIS): A Versatile System for Real-time Visual Scene Understanding

Real-time Visual SLAM based YOLO-Fastest for Dynamic Scenes

A Monocular Visual SLAM System Augmented by Lightweight Deep Local Feature Extractor Using In-House and Low-Cost LIDAR-camera Integrated Device

Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping

Unified framework for recognition, localization and mapping using wearable cameras

RVD-SLAM: A Real-Time Visual SLAM Toward Dynamic Environments Based on Sparsely Semantic Segmentation and Outlier Prior

YDD-SLAM: Indoor Dynamic Visual SLAM Fusing YOLOv5 with Depth Information

Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor

PL-SLAM: A Stereo SLAM System Through the Combination of Points and Line Segments

A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving