Abstract:Recent research on Simultaneous Localization and Mapping (SLAM) based on implicit representation has shown promising results in indoor environments. However, there are still some challenges: the limited scene representation capability of implicit encodings, the uncertainty in the rendering process from implicit representations, and the disruption of consistency by dynamic objects. To address these challenges, we propose a real-time dynamic visual SLAM system based on local-global fusion neural implicit representation, named DVN-SLAM. To improve the scene representation capability, we introduce a local-global fusion neural implicit representation that enables the construction of an implicit map while considering both global structure and local details. To tackle uncertainties arising from the rendering process, we design an information concentration loss for optimization, aiming to concentrate scene information on object surfaces. The proposed DVN-SLAM achieves competitive performance in localization and mapping across multiple datasets. More importantly, DVN-SLAM demonstrates robustness in dynamic scenes, a trait that sets it apart from other NeRF-based methods.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve some challenges faced by Simultaneous Localization and Mapping (SLAM) systems based on implicit representation in indoor environments. Specifically, the author points out the following three main problems: 1. **Limited scene representation ability**: - Existing implicit encoding methods have limitations in representing complex scenes. For example, iMAP [1] uses a neural implicit representation of position coding. Although it can achieve global consistency, it is too smooth in local details and is prone to forgetting details as the scene scale increases. - Methods based on feature grids or planes (such as NICE - SLAM [2] and ESLAM [4]) can accurately model local scene details, but their global representation and prediction abilities decline significantly. 2. **Uncertainty in the rendering process**: - In the volume rendering process of implicit representation, different information distributions along the same view ray may produce the same rendering result, which introduces uncertainty. Even if the rendering error is small, the distribution of scene information may be inaccurate. 3. **Destruction of consistency by dynamic objects**: - The movement of dynamic objects will destroy the static consistency of the scene, making pure - pose implicit mapping insufficient to model dynamic scenes. Existing NeRF - based SLAM methods perform poorly in handling dynamic scenes and are easily affected by dynamic objects, leading to localization and mapping failures. To solve these problems, the author proposes a real - time dynamic visual SLAM system based on local - global fusion neural implicit representation, named DVN - SLAM. The main innovations of this system include: - **Local - global fusion neural implicit representation**: By combining feature fusion and result fusion of the attention mechanism, using the advantages of continuous neural radiation fields for global representation and discrete feature planes for local representation, the scene representation ability is improved. - **Information - concentration loss**: Aiming at the uncertainty in the rendering process, an information - concentration loss based on rendering variance is designed to optimize the distribution of scene information and make it concentrate on the object surface. - **Robustness in dynamic scenes**: DVN - SLAM performs well in dynamic scenes, can automatically ignore fast - moving objects, and effectively restore the background occluded by dynamic objects. These improvements make DVN - SLAM competitive not only in static scenes but also able to maintain effective localization and mapping performance in highly dynamic scenes.

DVN-SLAM: Dynamic Visual Neural SLAM Based on Local-Global Encoding

DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM

NeuV-SLAM: Fast Neural Multiresolution Voxel Optimization for RGBD Dense SLAM

NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments

SNI-SLAM: Semantic Neural Implicit SLAM

VPE-SLAM: Neural Implicit Voxel-permutohedral Encoding for SLAM

DF-SLAM: Dictionary Factors Representation for High-Fidelity Neural Implicit Dense Visual SLAM System

Neural Implicit Dense Semantic SLAM

EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment.

NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

HI-SLAM: Monocular Real-time Dense Mapping with Hybrid Implicit Fields

DLD-SLAM: RGB-D Visual Simultaneous Localisation and Mapping in Indoor Dynamic Environments Based on Deep Learning

NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding

SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding

DNS SLAM: Dense Neural Semantic-Informed SLAM

NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising

NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM

DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes