Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

Yonghan Lee,Jaehoon Choi,Dongki Jung,Jaeseong Yun,Soohyun Ryu,Dinesh Manocha,Suyong Yeon
2024-10-07
Abstract:We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the new - view rendering problem in ground - robot trajectory datasets. Specifically, the existing 3D Gaussian Splatting (3DGS) algorithm has serious point - drift problems when dealing with complex scenes and insufficient multi - view observations in free - trajectory modes, and it is difficult to fix points on the real geometric structure. These problems lead to the poor performance of existing methods on ground - robot datasets. ### Main challenges 1. **Scarcity of multi - view information**: 3DGS requires a dense point cloud for point initialization and relies on multi - view photometric gradients for Adaptive Density Control (ADC) to expand to unoccupied areas. However, in ground - robot datasets, this information is often insufficient, resulting in a significant performance degradation. 2. **Difficulty in pixel - level pose accuracy**: 3DGS is very sensitive to the pixel - level pose accuracy of training images, and it is difficult to obtain pixel - level accurate poses in ground - view datasets. Traditional visual SLAM methods usually cannot consistently estimate poses in this case, especially when the images lack significant features or textures. ### Solutions To solve the above problems, the paper proposes a new rendering algorithm - Mode - GS. This method improves the existing 3DGS algorithm in the following ways: 1. **Monocular - depth - guided anchored Gaussian Splatting**: Initialize Gaussian points through pixel - aligned anchors generated by a monocular - depth network, thereby preventing point drift and improving robustness in complex scenes. 2. **Residual - form Gaussian decoder**: A new residual - form Gaussian decoder is introduced, which can directly initialize the attributes of Gaussian points (such as color, transparency, etc.) and significantly improve training efficiency. 3. **Scale - consistent depth calibration**: Aiming at the scale - ambiguity problem inherent in monocular depth, a scale - consistent depth loss function is proposed to ensure online scale calibration, thereby improving the rendering accuracy. ### Experimental results Experiments show that Mode - GS has achieved state - of - the - art rendering performance on both the R3LIVE odometry dataset and the Tanks and Temples dataset, especially it can still maintain high rendering quality without LiDAR point clouds. ### Summary The paper solves the limitations of the existing 3DGS algorithm in ground - robot datasets by introducing monocular - depth guidance and a residual - form decoder, providing a more robust new - view rendering method.