Abstract:Recent advancements in Simultaneous Localization and Mapping (SLAM) have increasingly highlighted the robustness of LiDAR-based techniques. At the same time, Neural Radiance Fields (NeRF) have introduced new possibilities for 3D scene reconstruction, exemplified by SLAM systems. Among these, NeRF-LOAM has shown notable performance in NeRF-based SLAM applications. However, despite its strengths, these systems often encounter difficulties in dynamic outdoor environments due to their inherent static assumptions. To address these limitations, this paper proposes a novel method designed to improve reconstruction in highly dynamic outdoor scenes. Based on NeRF-LOAM, the proposed approach consists of two primary components. First, we separate the scene into static background and dynamic foreground. By identifying and excluding dynamic elements from the mapping process, this segmentation enables the creation of a dense 3D map that accurately represents the static background only. The second component extends the octree structure to support multi-resolution representation. This extension not only enhances reconstruction quality but also aids in the removal of dynamic objects identified by the first module. Additionally, Fourier feature encoding is applied to the sampled points, capturing high-frequency information and leading to more complete reconstruction results. Evaluations on various datasets demonstrate that our method achieves more competitive results compared to current state-of-the-art approaches.

What problem does this paper attempt to address?

This paper attempts to solve the problem that existing SLAM (Simultaneous Localization and Mapping) systems based on NeRF (Neural Radiance Field) are difficult to accurately reconstruct 3D scenes in highly dynamic outdoor environments. Specifically, these systems usually assume that the environment is static or only slightly dynamic, which leads to their poor performance when dealing with real - world outdoor scenes with a large number of moving objects, and the scene reconstruction becomes inaccurate. To solve these problems, the paper proposes a new method aiming to improve 3D reconstruction and localization in highly dynamic outdoor scenes. The main innovations of this method include: 1. **Separation of Background and Foreground**: By dividing the scene into a static background and a dynamic foreground, identifying and excluding dynamic elements, a dense 3D map containing only the static background is created. 2. **Multi - resolution Octree Structure**: The octree structure in NeRF - LOAM is extended to support multi - resolution representation, which not only improves the reconstruction quality but also helps to remove the dynamic objects identified by the first module. 3. **Fourier Feature Encoding**: Fourier feature encoding is applied to sampling points to capture high - frequency information, thus achieving more complete reconstruction results. ### Mathematical Formulas Some of the key formulas involved in the paper are as follows: - **Calculation of Average Ground Height**: \[ \bar{z}_G=\frac{1}{|P_G|}\sum_{p\in P_G}z_p \] where \(P_G\) is the set of ground points within a radius \(r\) of the dynamic mask \(D_i\), and \(z_p\) is the \(z\)-coordinate of each point \(p\). - **SDF Loss Function**: \[ L_d = \frac{1}{|D_i|}\sum_{p_j\in D_i}\left(\Psi(p_j)-\frac{1}{|D_i|}\sum_{p_k\in D_i}\Psi(p_k)\right)^2 \] where \(\Psi(p_j)\) is the SDF value of point \(p_j\) in the dynamic region \(D_i\). - **Multi - resolution Encoding**: \[ F_s^\alpha(p)=\sum_{j = D_{\text{max}}-H + 1}^{D_{\text{max}}}F_j^\alpha(p) \] where \(F_j^\alpha(p)\) is the embedding obtained by trilinear interpolation through the eight vertices of the current - layer node. - **Fourier Feature Position Encoding**: \[ \gamma(p)=[\sin(2\pi B_1p),\cos(2\pi B_1p),\ldots,\sin(2\pi B_kp),\cos(2\pi B_kp)]^{\top} \] where \(B_i\) is a coefficient sampled from an isotropic Gaussian distribution \(N(0,\sigma^2)\). - **Total Loss Function**: \[ L_{\text{total}}=\lambda_sL_s+\lambda_fL_f+\lambda_eL_e+\lambda_dL_d \] where \(L_s\), \(L_f\), \(L_e\) and \(L_d\) are the SDF loss, free - space loss, Eikonal loss and dynamic - region SDF loss respectively, and \(\lambda_s\), \(\lambda_f\), \(\lambda_e\) and \(\lambda_d\) are weight parameters. Through these improvements, this method can achieve more accurate and complete 3D reconstruction in highly dynamic outdoor environments.

Neural Implicit Representation for Highly Dynamic LiDAR Mapping and Odometry

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

VI-NeRF-SLAM: a real-time visual–inertial SLAM with NeRF mapping

TivNe-SLAM: Dynamic Mapping and Tracking via Time-Varying Neural Radiance Fields

NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising

3D LiDAR Mapping in Dynamic Environments Using a 4D Implicit Neural Representation

LiDeNeRF: Neural radiance field reconstruction with depth prior provided by LiDAR point cloud

SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding

Neural Surface Reconstruction and Rendering for LiDAR-Visual Systems

Efficient Implicit Neural Reconstruction Using LiDAR

LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes

SDFMAP: Neural Signed Distance Fields for Mapping and Positioning in Real-Time

Towards Open World NeRF-Based SLAM

NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

Real-Time Dense Visual SLAM with Neural Factor Representation

RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields

SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

Map Completion for SLAM Systems Based on Neural Radiance Field

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

Camera and LiDAR Fusion for Urban Scene Reconstruction and Novel View Synthesis via Voxel-Based Neural Radiance Fields