SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

Vincent Cartillier,Grant Schindler,Irfan Essa
2024-04-17
Abstract:We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM systems consistently exhibit inferior tracking performance compared to traditional SLAM algorithms. NeRF-SLAM methods solve camera tracking via image alignment and photometric bundle-adjustment. Such optimization processes are difficult to optimize due to the narrow basin of attraction of the optimization loss in image space (local minima) and the lack of initial correspondences. We mitigate these limitations by implementing a Gaussian pyramid filter on top of NeRF, facilitating a coarse-to-fine tracking optimization strategy. Furthermore, NeRF systems encounter challenges in converging to the right geometry with limited input views. While prior approaches use a Signed-Distance Function (SDF)-based NeRF and directly supervise SDF values by approximating ground truth SDF through depth measurements, this often results in suboptimal geometry. In contrast, our method employs a volume density representation and introduces a novel KL regularizer on the ray termination distribution, constraining scene geometry to consist of empty space and opaque surfaces. Our solution implements both local and global bundle-adjustment to produce a robust (coarse-to-fine) and accurate (KL regularizer) SLAM solution. We conduct experiments on multiple datasets (ScanNet, TUM, Replica) showing state-of-the-art results in tracking and in reconstruction accuracy.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when using NeRF (Neural Radiance Field) for real - time online tracking and 3D reconstruction, how to improve the performance of the SLAM (Simultaneous Localization and Mapping) system, especially in terms of tracking accuracy and 3D reconstruction accuracy. Specifically, the paper points out that the existing NeRF - SLAM systems are inferior to traditional SLAM algorithms in tracking performance and have difficulties in converging to the correct geometric structure. These problems mainly stem from the local minimum problem in the optimization process and the lack of initial correspondence. In addition, although the SDF (Signed Distance Function) - based method can accelerate convergence, it may lead to sub - optimal geometric results. To solve these problems, the paper proposes the following improvements: 1. **Coarse - to - fine tracking optimization strategy**: By implementing a Gaussian pyramid filter on top of NeRF, the convergence range of image alignment optimization is expanded, thereby improving the robustness and efficiency of tracking. 2. **New KL regularizer**: A new KL regularizer for the ray termination distribution is introduced to ensure that the scene geometric structure consists of empty space and opaque surfaces, thus improving geometric convergence. 3. **Combination of local and global bundle adjustment**: By performing local and global bundle adjustment simultaneously, more accurate camera pose prediction and 3D reconstruction are achieved. These improvements make the SLAIM's tracking and reconstruction performance on multiple datasets (such as ScanNet, TUM, Replica) reach the state - of - the - art level. ### Formula summary - **Ray termination distribution**: \[ w(r)=T(r)\sigma(r) \] where \(T(r)=\exp\left(-\int_0^r\sigma(s)\,ds\right)\) - **Pixel color and depth value calculation**: \[ \hat{c}=\sum_{i = 1}^M w_i c_i;\quad\hat{d}=\sum_{i = 1}^M w_i d_i \] - **KL regularization loss**: \[ L_{KL}=-\frac{1}{N}\sum_{n = 1}^N\sum_k\log\left(\frac{w(r_k)}{\tilde{w}(r_k)}\right)\Delta r \] - **Total loss function**: \[ L = L_{rgb}+\lambda_d L_d+\lambda_{KL}L_{KL} \] where \(L_{rgb}\) is the \(l_2\) loss of RGB pixel values, and \(L_d\) is the \(l_1\) loss of depth values. These formulas show the key technical details in the paper and help to understand the effectiveness of its method.