Abstract:We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the new - view rendering problem in ground - robot trajectory datasets. Specifically, the existing 3D Gaussian Splatting (3DGS) algorithm has serious point - drift problems when dealing with complex scenes and insufficient multi - view observations in free - trajectory modes, and it is difficult to fix points on the real geometric structure. These problems lead to the poor performance of existing methods on ground - robot datasets. ### Main challenges 1. **Scarcity of multi - view information**: 3DGS requires a dense point cloud for point initialization and relies on multi - view photometric gradients for Adaptive Density Control (ADC) to expand to unoccupied areas. However, in ground - robot datasets, this information is often insufficient, resulting in a significant performance degradation. 2. **Difficulty in pixel - level pose accuracy**: 3DGS is very sensitive to the pixel - level pose accuracy of training images, and it is difficult to obtain pixel - level accurate poses in ground - view datasets. Traditional visual SLAM methods usually cannot consistently estimate poses in this case, especially when the images lack significant features or textures. ### Solutions To solve the above problems, the paper proposes a new rendering algorithm - Mode - GS. This method improves the existing 3DGS algorithm in the following ways: 1. **Monocular - depth - guided anchored Gaussian Splatting**: Initialize Gaussian points through pixel - aligned anchors generated by a monocular - depth network, thereby preventing point drift and improving robustness in complex scenes. 2. **Residual - form Gaussian decoder**: A new residual - form Gaussian decoder is introduced, which can directly initialize the attributes of Gaussian points (such as color, transparency, etc.) and significantly improve training efficiency. 3. **Scale - consistent depth calibration**: Aiming at the scale - ambiguity problem inherent in monocular depth, a scale - consistent depth loss function is proposed to ensure online scale calibration, thereby improving the rendering accuracy. ### Experimental results Experiments show that Mode - GS has achieved state - of - the - art rendering performance on both the R3LIVE odometry dataset and the Tanks and Temples dataset, especially it can still maintain high rendering quality without LiDAR point clouds. ### Summary The paper solves the limitations of the existing 3DGS algorithm in ground - robot datasets by introducing monocular - depth guidance and a residual - form decoder, providing a more robust new - view rendering method.

Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering

Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting

DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus

Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs

SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

Unbounded-GS: Extending 3D Gaussian Splatting with Hybrid Representation for Unbounded Large-Scale Scene Reconstruction

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM

UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis